AD-A177  124 


rr  yj^^rv.VT.VVV.V.^~:  V.V."-  1;V.V.  ^  ■•-%'\'C"L^rWv  V“ 


Januarv  1  OS  ^ 


UILU-ENG 

ACT-75 


-8"-2207 


COORDINATED  SCIENCE  LABORATORY 

Ctulege  of  Engineering 
Applied  Compulation  Theory 


5 


dt/c 

ELECTEf* 
FEB  2  4  8871 1 

i  D  ** 


PRECEDENCE- 
CONSTRAINED 
SCHEDULING  WITH 
MINIMUM  TIME 
AND  COMMUNICATION 


Marsha  Lise  Prastein 


one  FILE  COES 


UNIVERSITY  OF  ILLINOIS  AT  UR3ANA -CHAMPAIGN 


Approved  for  Publu-  Rr!eu>e.  Distribution  Unlimited. 


87  2  24  .005 


Unclassified 


i»  report  security  classic 'Cation 
Unclassified 


2a  SECURITY  CLASSIFICATION  AUTHOKlTV 

M/A 


2b.  OE  CLASS  I  F  l  CAT  I  ON  /  O  OWING  RAGING  SCuEOULi 

M/A 


4.  AC  A  FOAMING  ORGANIZATION  REPORT  NUMBER'S) 

UILU-ENG-  87-2207  ;ACT-75) 


REPORT  DOCUMENTATION  PAGE 


IB.  restrictive  MARKINGS 

None 


3  OlST«»auTION/ AVAILA0*LlTV  of  K0FQRT 


6*.  NAMC  OF  P£RPO«MlNG  ORGANIZATION 

Coordinated  Science  Lab 
Cniversitv  of  Illinois 


be,  AO  OR  ESS  /City,  State  and  ZIP  Code) 

1101  W.  Springfield  Avenue 
Urbana,  Illinois  61801 


Ba.  NAME  OF  FUNOING/SPONSORING 

organization  joint  Services 

Electronics  Program  &  ONR 


ftcL  aQCRESS  City,  State  and  ZIP  Code) 

800  M.  Quincy  Street 
Arlington,  VA  22217 


Bl  OFFICE  SYMBOL 
i If  applicable/ 


Ob.  OFFICE  SVMSOL 
(If  applicable) 


Approved  for  public  release; 
distribution  unlimited 


S.  MONITORING  ORGANIZATION  REPORT  NUMBER'S) 

M/A 


7a.  NAME  OF  MONITORING  ORGANIZATION 

Office  of  N’aval  Research 


7b.  ADDRESS  i City.  State  and  ZIP  Coda t 

800  N.  Quincy  Street 
Arlington,  VA  22217 


9.  PROCUREMENT  INSTRUMENT  IDENTIFICATION  NUMBER 

ONR:  N00014-85-K-0570 
JSEP :  N00014-84-C-0149 


10.  SOURCE  OP  FUNOiNG  NOS 


I 


’L  TLTLn  ixoi.fof  iFcuciiY  c.'a»nTiFai|o«i  Precedenca-Constr 
Schedulinct .  with  Minimum  Time  ana 


TASK 

WORK  UNIT 

NO. 

NO 

n/a  j 

j. 

17.  PfflSCNAL  AUTmQRISI 

ha  Lise  Prastein 


\2d.  TVFg  OF  RtPQRT 

Technical 


14.  DATE  of  report  ,Yr.  Vo.  Omyi 

IS.  PAGE  COUNT 

January  19°7 

69 

1%.  SU9„ECT  T6RMS  Continue  an  -evene  *r  necenaan*  and  .aenttfy  by  bloc *  numbert 

scheduling,  precedence,  task  assignment,  communicati 
complexity,  computational  complexity,  NP-completenes 
aranh .  t  top 


19.  Ag$TRAC*  Conrinuc  on  -wir »rw  « /  neewaaor*  and  denary  blocm  numoer i 

I  consider  task  systems  modeled  by  directed  acyclic  graphs  in  which  nodes 
represent  tasks  and  arcs  express  precedence  constraints,  and  each  task  can  be 
computed  by  a  processor  in  one  unit  of  time.  It  is  known  that  if  there  are  only 
two  processors  or  if  the  graph  is  a  tree,  then  there  are  polynomial  time 
algorithms  for  scheduling  the  graph  in  minimum  time,  but  in  general  the  minimum 
time  scheduling  problem  is  NP-complete.  The  communication  cost  of  a  schedule 
is  the  number  of  pairs  (p,x)  such  that  processor  p  does  not  compute  task  x  but 
computes  an  immediate  successor  of  x;  that  is,  the  result  of  x  must  be 
communicated  to  p.  I  consider  the  problem  of  finding  schedules  that  minimize 
finishing  time  and  among  those,  finaina  schedules  that  minimize  communication. 

I  prove  that  the  problem  with  two  processors  on  an  arbitrary  graph  is  N?-complet 
The  problem  with  arbitrarily  many  processors  on  a  tree  is  also  NP-comp iete . 

The  case  of  two  processors  on  a  tree  is  open  in  general,  but  I  establish  ticht 
bounds  for  owe  processors  on  the  indirezted  conpleoe  ternary  tree  of  height  k: 


Bhhhi 


-p  -a 

.  2*  AdSTRAC'  : 

J 

u.vie  -S  3 a-  I  Z~'C  I 

j  "Jnclassl 

DC  CCR.\ 

bm 

1  "472.  13  Arq  ic.-  C-.  ;f  •  .- 

*?  S  :3SCl;*:. 

8 

19.  continued 
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ABSTRACT 


I  consider  task  systems  modeled  by  directed  acyclic  graphs  in  which  nodes  represent  tasks 
and  arcs  express  precedence  constraints,  and  each  task  can  be  computed  by  a  processor  in  one 
unit  of  time.  It  is  known  that  if  there  are  only  two  processors  or  if  the  graph  is  a  tree,  then 
there  are  polynomial  time  algorithms  for  scheduling  the  graph  in  minimum  time,  but  in  general 
the  minimum  time  scheduling  problem  is  NP-complete.  The  communication  cost  of  a  schedule  is 
the  number  of  pairs  (p.x)  such  that  processor  p  does  not  compute  task  x  but  computes  an 
immediate  successor  of  x  ;  that  is.  the  result  of  x  must  be  communicated  to  p.  1  consider  the 
problem  of  finding  schedules  that  minimize  finishing  lime  and  among  those,  finding  schedules 
that  minimize  communication.  1  prove  that  the  problem  with  two  processors  on  an  arbitrary 
graph  is  NP-complete  The  prooiem  with  arbitrarily  many  processors  on  a  tree  is  also  \P- 
comrlete.  The  case  of  two  processors  on  a  tree  is  open  in  general,  but  i  establish  tight  bounds 
for  two  processors  on  the  indirected  complete  ternary  tree  of  height  k  :  for  minimum  t.me.  com¬ 
munication  k  — logjA-  t3  is  achievable,  and  communication  k  —  logj/c  +1  is  necessary. 
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CHAPTER  1 

INTRODL'C TION'/LITERATL  RE  REVIEW 

1.1  Introduction 

The  jiKeni  of  multiprocessor  networks  has  introduced  potential  computing  capabilities 
previously  undreamed  of.  Before  we  can  make  full  use  of  these  capabilities,  however,  we  have 
much  to  learn  about  the  nature  of  parallel  computing.  It  is  well  known  that  although  in  the 
ideal  situation  system  throughput  increases  linearly  w  ith  the  number  of  processors,  in  reality 
this  does  not  occur  In  fact,  often  after  the  first  lew  processors,  adding  additional  processors  to  a 
system  may  decrease  system  throughput.  The  reason  ‘or  this,  according  to  Chu  el  al.  [Chu 

is  interprocessor  communication:  users  tend  not  to  schedule  the  tasks  among  processors  in 
such  a  'w  ay  as  to  minimize  final  completion  lime  as  determined  hv  actual  task  computation  time 
and  communication  between  processors. 

In  this  thesis.  I  study  the  difficulty  of  scheduling  multitask  jobs  on  multiprocessor  systems 
to  minimize  completion  lime  and  communication  cost.  A  multitask  job  is  represented  by  a 
directed  acyclic  graph  (dag)  in  which  each  node  represents  a  task,  and  an  arc  from  node  '■  to 
ntnie  u  means  that  node  u  must  be  computed  before  node  u  .  and  the  result  of  computation  of  v 
must  be  know  n  by  me  processor  computing  u  .  To  obtain  a  schedule  that  minimizes  both 
processing  completion  lime  and  communication  cost  (defined  as  the  number  of  times  that  an. 
processor  must  pass  tiie  result  of  anv  of  its  computations  to  another  processor  so  that  the  second 
car.  minute  a  task!  is  an  \P  iard  problem  I  nave  shi-w/n  this  to  be  true  for  arbitrary  graphs 
on  :  wo  prt  c-»'-wi-s.  aith  'ugh  ('off man  and  Crah».m  ha  e  de  ised  a  polynomial  time  ai-cTthm 
the  nrc  .em  v  nen  c  mm  u  meat  .on  not  v.  no  .ier*.:  '  '  -if  man  hurt  her  me  re.  !  show  the 

rr  nl-m  .s  -r  :•!  \P-  .,rc  for  graphs  '  .r -:egre-'-  .  red  o  two.  Vthouch  •  r.e  case  :  two 
"t  cessorv  w  mrutme  a  earn  hmite.  tc-  a  c-ne  <.  're-'  -em.. .ns  ar.  aren  pro'oie  1  r> ■ .  *  _prer 
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and  lower  bounds  on  communication  for  minimum  time  schedules  for  complete  binary  and 
ternary  trees.  I  determine  the  lower  bound  for  ternary  trees  by  showing  a  new  partition  result 
for  complete  ternary  trees  using  a  combinatorial  method  that  may  be  of  independent  interest. 

1.2  Scheduling 

Scheduling  and  task  distribution  (among  processors)  are  two  problems  that  have  been 
studie  !  fairly  extensively  in  the  last  ten  years,  although  much  of  the  work  on  the  two  problems 
has  been  disjoint.  Section  1.2  addresses  scheduling  literature.  In  Section  1 .3.  1  review  some  of 
the  research  in  communication  minimization.  In  studying  task  scheduling,  we  view  the  overall 
job  as  a  directed  acyclic  graph,  where  each  node  in  the  graph  represents  one  task,  or  part  of  the 
overall  job  to  be  completed.  Thus,  when  1  use  the  term  node.  1  will  mean  the  task  represented 
by  that  node.  Arcs  in  the  graph  represent  precedence  constraints,  i.e.,  (u  ,v  )  is  in  the  graph  if 
task  u  must  be  computed  before  computation  of  task,  v  can  be  starved.  Thus,  the  graph  specifies 
a  partial  order  on  the  tasks.  In  the  general  (unit  time  execution)  scheduling  problem  as  defined 
by  L  liman  [L  liman  1975]  we  are  given  a  set  5  of  tasks,  a  partial  order  El  on  S  expressing 
precedence  constraints,  a  number  k  of  identical  processors,  and  a  time  limit  i  .  The  question  is 
w  heiher  there  is  a  total  function  f  :S  — *  {1.2.  ...  i  —  1 !  such  that 

1  If  u  I  v  .  then  /  (a  )  <  f  ( v  1  v  u  .v  €  S  . 

2  i  .0  Ss  ;  <  t  there  are  at  most  k  values  of  v  (v  6  5)  for  w  hich  /  (v  )  =  i . 


Paraphrased,  the  problem  is  to  determine  whether  the  tasks  can  be  scheduled  on  the  k 
processors  within  time  t  such  that  for  any  two  tasks  u  and  v  in  5  .  if  u  Z  v  .  then  u  is 
scheduled  before  v  .  and  at  each  instant,  each  processor  is  assigned  to  at  most  one  task  l  liman 
shows  that  this  unit  time  precedence-constrained  scheduling  problem  is  NP-complete. 
Furthermore,  he  goes  on  to  show  that  two-processor  scheduling  with  weight'  of  1  and  2.  '.hat  is. 
'he  problem  stated  above  extended  to  allow  tasks  either  one  or  two  units  of  time  to  compute,  is 
\P-complete. 


o’ 


q 
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This  Iasi  result  is  moderalelv  surprising  since  Coffman  and  Graham  construe!  a  polynomial 
time  algorithm  tor  two  processor  scheduling  if  all  tasks  base  equal  execution  limes  [(  offman 
1972]  The  algorithm  for  computing  the  schedule  consists  of  two  pails,  hirst  the  tasks  are  ail 
labeled  w  ith  positive  integers  in  such  a  way  that  v  a  €  i' .  V  v  6  S  such  that  u  L  v  .  the  label  of 
u  is  greater  than  the  label  of  v  .  The  second  part  of  the  algorithm  is  processor  assignment. 
Where'  er  a  processor  becomes  available,  it  computes  the  task  u  \v  uh  highest  label  such  that 
tl  ere  is  no  task  v  II  u  that  has  not  vet  been  computed,  I  liman  s  results  show  that  simply 
allowing  some  jobs  to  require  compulation  time  of  2  makes  the  problem  significantly  more 

U  1 II  is  tl  1 1 . 

More  recently  .  (la bow  has  dey eloped  an  almost  linear  time  algorithm  lor  unit  execution 
time  two  processor  scheduling  [Gabow  1982], 

If  we  require  the  graph  representing  the  partial  ordering  for  the  unit  time  execution 
scheduling  problem  to  be  a  tree,  we  can  again  design  a  polv  nomia!  time  algorithm  for  the 
problem  regardless  of  the  number  k  of  processors  [Hu  1961].  Hu  gives  an  algorithm  for 
scheduling  indirected  trees,  i.e..  trees  T  with  root  i  such  that  for  all  nodes  u  in  T  .  u  II  o  .  in 
linear  time.  1  nave  not  been  able  tc  devise  a  poly-time  algorithm  for  the  same  problem  taking 
communication  costs  nto  account.  Tike  the  Coffman-Graham  two  processor  scheduling 
algorithm.  Mil’s  algorithm  requires  labeling  all  nodes  ol  the  graph  with  integers.  In  this  ease. 
jach  node  u  is  given  the  label  J(u  :  +  1  where  Jin  '  is  the  length  ol  the  path  tram  u  to  ’.he 
root.  1  or  the  actual  Schedule,  define  an  callable  node  to  oe  any  node  a  such  that  all  ol  u  s 
predecessors  have  'een  computed;  then  at  each  lime  unit  compute  th.  *c  available  nodes  .vit'i 
m  I  a  be-,,  n  ’he  .e  ’  t.es  oecisi  -ns  are  arbitrary  !f  fewer  ,‘ian  k  •«H.es  are  r  u’l.b'G 
..'.en.  •'.male  i  1  M  ne  a  .  a.-abie  n  'de.s 
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1.3  Communication  Minimization 

W  hereas  scheduling  problems  concern  determining  at  what  time  each  of  several  or  many 
tasks  should  be  computed,  communication  minimization  problems  concern  processor  allocation. 
Much  ol  the  prev  ious  work  in  this  field  does  not  consider  precedence  constraints  among  the 
various  tasks  at  ail.  but  aims  primarily  at  minimizing  computation  cost.  The  system  has  two 
kinds  of  nonnegative  contingent  costs:  a  pmcessing  cost  for  each  pair  ( P  .u  )  w  here  P  is  a 
processor  and  u  a  task  in  the  svstem.  and  a  communication  cost  for  each  pair  (u  ,v  )  w  here  u  and 
v  are  tasks.  The  processing  cost  is  incurred  when  task  u  is  computed  on  processor  P .  while  the 
communication  cost  is  incurred  when  tasks  u  and  v  are  computed  on  different  processors.  If  the 
ci  m  mu  meal  ion  cost  for  a  given  pair  (u  ,v  )  is  infinite,  then  nodes  u  and  v  must  be  computed  by 
tiie  same  processor.  Similarly,  if  the  processing  cost  for  a  pair  (P  ,u  )  is  infinite,  then  node  u 
cannot  be  processed  on  processor  P .  Total  inter- processor  communication  cost  is  the  sum  of  all 
communication  costs  incurred  in  a  given  schedule.  The  computation  cost  is  defined  as  the  sum  of 
the  processing  cost  for  each  task  on  the  processor  to  which  it  is  assigned,  and  total  inter- 
processor  communication  cost. 

Chu  et  ai.  present  a  number  of  strategies  for  the  task  allocation  problem  [Chu  19SO]  The 
first  strategy  lakes  a  network  flow  approach  to  minimizing  total  cost  (as  defined  above  1  in  a  two 
processor  system  [Stone  1977]  In  this  method,  the  entire  job  is  represented  as  a  graph  with  each 
nude  representing  a  task  to  be  performed.  The  graph  has  additional  nodes  P  and  Q  .  one  for  each 
"!  tne  processors.  L  nlike  the  directed  graphs  used  for  the  scheduling  problem,  in  this  graph,  the 
edges  are  undirected  and  weighted.  I  or  everv  pair  ol  tasks  u  and  v  .  the  graph  has  an  edge  ( u  ,v  ) 
v  ill)  weight  eiiu.i!  to  the  incurred  communication  cos:  when  u  and  r  are  computed  bv  different 
processors,  furthermore  for  each  task  a  the  cr.iph  has  edges  t  P  .u  >  and  1 Q  .u  i  \  ith  weights 
drsmmed  below  1  dge  (  P  .u  )  has  weight  equal  to  me  processing  cost  t(>  .u  f  Similar!'.',  edge 
Q  a  1  has  -v  eight  -mua!  to  processing  cost  '  P  u  '  Minimizing  total  cost  as  defined  above  is  a 
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minimum  cut  problem  lor  this  graph. 

Unfortunately  .  this  approach  is  quite  limited  since  it  handles  only  two  processor  systems 
The  next  approach  presented  by  Oni  el  al.  [Chu  19<S0].  an  integer  programming  approach,  is 
much  more  flexible  and  expandable  In  this  method,  communication  costs  between  all  pairs  ol 
tasks  are  represented  in  a  \olume  matrix,  and  the  objective  function  .s  the  sum  of  individual 
processing  costs  and  communication  costs  This  method  is  more  flexible  by  \  irtue  of  the  tact 
that  it  allows  more  than  two  processors,  and  other  limits  can  be  programmed  into  the  system 
simpiv  b\  adding  constraint  equations  for  each.  01  course,  the  major  disadvantage  of  this 
approach  is  that  the  general  integer  programming  problem  is  well  known  to  be  NP-hard  [Clares 
1  979] 

l.o.  in  her  doctoral  thesis  (Lo  1%3],  also  addresses  the  problem  of  minimizing 
communication  cost  in  a  multiprocessor  system  She  presents  a  heuristic  algorithm.  Algorithm 
A  for  obtaining  a  near-optimal  processor  assignment  for  systems  with  more  than  two 
processors.  Algorithm  A.  consists  of  three  parts:  Iterative.  Lump,  and  Greedy.  Iterative  is  a 
generalization  oi  the  network  flow  approach  suggested  earlier  to  give  an  opt'mal  assignment  for 
"rims’"  >f  the  tasks.  Lump  determines  a  lower  bound  on  '.be  total  cost  ol  a  k  -way  :ut  (where  k 
is  '.he  number  of  processors)  of  the  remaining  unassigned  task.-.,  and  based  on  this  number 
decide-,  eu her  to  lump  ail  of  the  tasks  m  a  group  assignee!  to  one  processor  or  to  let  Greedy 
complete  the  assignment  Greedy  clusters  those  tasks  tetween  which  communication  costs  arc 
"'arce"  and  assigns  „.!  tasks  in  u  sing  e  cluster  lo  a  single  processor 

.None  of  the  approaches  suggested  thus  far  address  the  question  of  overall  completion  time. 

•  '  ’ceew.n ;  costs  1  ■-  jn».-v  'dual  tasks  on  giver,  pr.-ce  <ors  .j re  i re.aed  as  c.  rojrr  and  !  ai I 
'roc?s-  -  arc  der.ticai.  r.ese  algor:’,  b.rr.s  s  •  aid  sim:  .ssjgp.  al  tasks  •  er.c  processor  a  no.  save 

ad  .omro.  up.  .cat  :oo.  cc-s.s.  is  t>-  not  a  realistic  ..  -pr-'acn  since  'he  coal  in  nimmiiir.; 
c -m  xtii.r  or.  c  .'•iira'eiv  mi;-.:  route  >v  erull  completion  time  .  o  aisc  :dd  res.se  th  - 
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issue.  however  [l.o  lb, S3]  She  points  out  that  it  is  very  difficult  to  schedule  n  tasks  on  k 
processors  (n  and  k  positive  integers)  :n  a  balanced  way.  Her  model  for  scheduling  to  minimize 
communication  anil  completion  time  includes  a  set  S  of  n  (disjoint)  tasks  each  with  finite 
execution  time,  k  identical  processors,  anil  contingent  communication  costs  c,„  for  everv  pair  of 
tasks  (u  .v  ).  u  and  v  €  S  .  The  cost  of  a  processor  allocation  A  is  defined  to  be  the  maximum 
over  all  processors  P  of  the  sum  ol  processing  costs  of  all  tasks  computed  by  P  and  the 
communication  costs  incurred  b\  tasks  computed  by  P .  Die  problem,  then,  is  to  devise  an 
allocation  to  minimize  this  maximum  over  all  possible  processor  allocations  for  the  given  tasks 
( i.e  .  it  is  a  nirnmux  problem ).  l.o  points  out  that  even  without  the  contingent  communication 
cost  constraint,  the  problem  is  NP-hard.  The  addition  of  communication  costs  produces  a  more 
difficult  problem.  If.  however,  execution  limes  1  >r  all  jobs  are  identically  equal  to  a  constant  i  . 


and  similarly  if  the  contingent  communication  cost  between  every  pair  of  tasks  is  equal  to  a 
i  k 

constant  c  ,  then  if  —  >  —  .  an  optimal  schedule  is  determined  by  assigning  the  tasks  in  a 

i  k 

round  robin  manner;  if  instead  —  ^  .  then  an  optimal  assignment  consists  of  computing  all 

c  n 


tasks  .1  a  single  processor.  Lo  goes  on  to  give  some  heuristic  results  for  cases  when  computation 


limes  and  contingent  communication  costs  are  not  as  well  behaved. 


1.4  Other  Related  Communication  Results 

Papadimitriou  and  Sipser  investigate  communication  from  a  slightly  different  angle 
[Papaikmilriou  19S4a].  They  investigate  a  two  processor  system  that  uses  2n  input  bits  divided 
between  the  processors  such  that  each  processor  has  an  n  -bit  input  string.  They  describe  a 
communication  complexity  hierarchy  based  on  the  number  of  bits  that  must  be  passed  between 
the  two  processors  in  order  to  solve  a  problem  (expressed  in  terms  of  language  recognition)  that 
is  a  l  unction  of  all  Zn  bits.  COMNK  /  1  n  !)  ;s  defined  to  oe  the  set  of  languages  that  can  be 
recognized  by  passing  exactly  /  <n  )  bits  between  the  two  processors,  regardless  of  the  original 


*  ■  !»  *L  *  w  *  w  <  *  * 


partition  of  the  In  bits  between  them.  Papadimilriou  and  Sipser  go  on  to  establish  several  facts 
about  the  communication  complexity  hierarchy  that  parallel  the  classic  time-space  hierarchy. 
For  example.  f()MM(n  -1 )  *  0.  and  \CO\|\!(  /  (n  ))  <Z  COMM  (2’  “  '> 

Papadimilriou  anti  Tsitsiklis  address  the  problem  of  dividing  a  job  s  inputs  bet  veen  two 
processors  assigned  to  sompute  it  in  order  to  minimize  communication  necessary  bet  ween  the 
processors  in  order  to  complete  the  job.  This  is  an  \P-hard  problem  [Papadimilriou  1982]. 
Comparing  this  result  w  ilh  earlier  ones.  w  e  see  that  not  only  is  it  difficult  to  minimize 
communication  necessary  between  processors  when  contingent  tt  mmunication  between  tasks  is 
know  n.  but  minimizing  those  contingent  costs  is  also  difficult. 


1.5  Other  Related  Work 

Thus  far.  we  have  considered  time  scheduling  and  processor  allocation  entirelv  disjoiniiy 
from  all  other  issues,  although  Chu  el  al.  do  point  out  that  the  linear  programming  approach  to 
minimizing  overall  cost  considering  communication  costs  is  expandable  (Chu  1980],  Carey  and 
Johnson  consider  a  similar  issue  in  lime  scheduling,  i.e..  the  problem  of  scheduling  to  minimize 
completion  time  when  there  are  additional  resource  constraints.  In  this  mode!  we  are  given  a  set 
.S'  of  ta  ,ks  and  a  number  k  of  processors,  fiach  u  €  .S'  has  an  associated  completion  time  c  1  u  . 
anti  f  r  each  of  r  resources,  a  nonnegative  resource  requirement  F  ‘u  ).  The  tasks  are  related  bv 
a  partia'  >. rder  Z  as  in  scheduling  problem^  discussed  earlier  Furthermore,  we  are  given  a  time 
limit  :  -*nd  an  integer  b  for  each  resource  F  1  ^  ^  r  .  The  b  represent  the  amount  1  the 

associated  reset: •  .e  av  ailable  in  the  sy  stem.  The  problem  is  to  achie  .  e  a  - .  h.etiu le  defined  as  a 
i  unction  /  S  —  !  1 .  l]  w  ith  the  follow  ng  stipulations: 

1  .  >•  C.  .  i'  -  i  .  ari  es  ( u  l  c  1  u  •  <  / 


'.tie  nu  -'.be"  of  ion-'  a 


I.e’  ne  the  ^ei  a  i  S  <uc:,  that  !  ku  > 


v-v  •  -'s'-  .  z  '--.v. '--.v.  s  v.  s  -.- - 


v  \  \  \  s .»  \ 


,s 

Then 

v  i  .j  .  1  ^  i  ^  i  .  1  ^  j  $  7 ■  ,  (i/  )  ^  b;  . 

i 

Since  the  general  scheduling  problem  is  NP-complete.  so  is  ihis  liniiied  resource  scheduling 
problem.  Carey  and  Johnson  prove  that  il  is  still  \P-compleie  when  k  —  2.  c  (u  )  =  c  (v  )  V 

u  .v  €  5  and  Z  defines  a  forest  whenever  r  >  0.  In  contrast,  if  Xr  =  2.  c  is  a  constant 
function,  and  Z  is  empty,  then  v  r  >  0  there  is  a  polynomial  time  algorithm  to  schedule  the 
tasks.  If  k  =  3.  then  the  problem  again  becomes  \P-complete,  even  for  r  =  1. 

1.6  Scheduling  and  Communication  Minimization 

The  problem  of  combining  finding  a  minimal  time  schedule  w  ith  a  minimal  communication 
processor  allocation  is.  of  course,  at  least  as  difficult  as  either  of  its  parts,  and  is  a  combination  of 
some  of  these  previous  problems.  Also,  to  speak  of  "minimizing  time  and  communication"  can  be 
misleading.  If  all  tasks  are  computed  by  the  same  processor,  then  communication  costs  are 
eliminated.  Unfortunately,  though,  the  system  may  not  be  able  to  complete  the  entire  job  in 
minimum  time  with  most  of  the  processors  idle  during  the  entire  compulation.  Papadimitriou 
an^i  l  liman  discuss  this  time-communication  tradeoff  in  relation  to  diamond  dags 
[Papadimitriou  l')S4b].  A  diamond  dag  is  a  directed  acvciic  graph  (dag)  that  is  the 
generalization  of  Figure  1  below  to  n  2  nodes.  Fach  node  of  the  dag  represents  one  task  needing 
computation,  and  the  edges  specify  a  partial  order  on  the  tasks.  Additionally,  the 
communication  cost  of  a  schedule  and  processor  allocation  is  defined  to  be  the  number  of 
;  processor,  node,  immediate  predecessor  triples  (P  .n  .p  )  such  that  processor  P  first  computes 

node  n  before  it  has  computed  node  p  lallhough  p  must  have  been  computed  at  some  earlier 
poini  by  a  duferent  processor)  or  if  it  never  computes  p .  Papadimitriou  and  l  iiman  allow 
nodes  to  be  computed  more  than  once  :n  this  model.  Relating  this  to  other  models  studying 
communication,  we  can  imagine  that  an  arc  from  node  u  to  node  v  represents  an  contingent 
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communication  cost  of  1.  whereas  the  lack  of  such  an  arc  represents  an  contingent 
communication  cost  of  0.  If  each  node  is  computed  only  once,  then  the  analogy  is  completely 
parallel:  cost  c„  is  incurred  whenever  nodes  u  and  v  are  computed  by  different  processors,  that 
is.  when  node  v  is  computed  by  a  processor  that  does  not  compute  node  u  .  Since  multiple 
computation  of  nodes  is  allowed.  cul  is  incurred  whenever  the  first  computation  of  node  v  is  bv 
a  processor  that  has  not  (yet)  computed  node  u  .  Using  this  model.  Papadimitriou  and  Ullman 
derive  lime-communication  tradeoffs  for  this  specific  kind  of  graph  (or  partial  order).  In 
particular,  any  schedule  computing  an  n  Xn  diamond  dag  in  lime  t  =(i(n:'and 
communication  c  must  satisfy  ct  =  fl(n3) 

Aggarwa!  and  Chandra  also  study  communication-lime  tradeoffs  for  scheduling  dags 
[Aggarwal  1985].  T..e:r  model  uses  a  concurrent  read  exclusive  write  PRAM  in  which  each 
pr  *essor  has  a  local  memory  and  a!’,  processors  share  a  global  memory.  The  computation  is 
divideu  into  computation  mb  communication  steps.  In  a  :  amputation  step,  each  processor  can 
access  two  loc  :  a  err.  irv  addresses,  whereas  in  a  communication  step,  each  ■'rocessor  can  access 
ore  gln.al  memcr  a-.rdress.  A!!  rape's  are  in  .he  git  oai  memory,  and  ail  hnai  :utpouts  mast  ae 
reu  —red  to  the  g!  ba4  m  -naory.  Aggarwa]  and  Chanc'3  use  'hi  model  to  s;  adv  minimum 
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communication  delays.  i.e.  the  minimum  number  of  communication  steps  necessary .  for 
particular  computations;  for  instance,  w  hen  two  n  X  n  matrices  are  multiplied  using  only  scalar 


multiplications  and  additions,  there  is  a  bound  on  communication  delay  ol  0 


w  here  k 


% 

the  number  of  processors,  and  k  — rl~ Aggarwa I  and  Chandra  also  study  communication- 

log  "/I 

time  IradeolTs.  They  note  that  the  tradeoff  for  diamond  dags  presented  by  Papadimitriou  and 
l  liman  [Papadimitriou  1  *>S4b]  is  obtainable  onlv  for  t  =  1  or  t  =  n  . 

0 

Prom  this  point  forward  I  will  use  the  term  schedule  to  refer  to  the  lime  schedule  and  the 
processor  allocation  simultaneously.  At  rati  et  al.  address  the  problem  of  finding  a  minimum 
time  and  communication  schedule  for  any  given  graph  [Afrali  1985].  They  use  a  model  similar 
to  that  of  Papadimitriou  and  L  llman  except  that  they  do  not  limit  the  graph  to  diamond  dags 
and  they  allow  each  node  to  be  computet!  only  once.  Thus,  for  a  given  graph  G  =  (V  -A  )  and 
schedule  D  the  communication  cost  is  defined  to  be  the  number  of  pairs  of  nodes  ( M  .v  )  such 
that  u  ,v  6  .  iu  ,v  )  6  A  .  and  u  and  v  are  computed  by  different  processors  in  D  ■  This 

parallels  Papadimitriou  and  U II man's  definition  of  communication  when  multiple  computation 
ot  nodes  is  not  allowed.  Since  scheduling  in  general  is  \P-complete  whenever  k  >  3  and  the 
partui  order  on  the  nodes  is  arbitrary,  we  cannot  expect  better  for  minimum  time  and 
communication  scheduling.  Al  rati  et  al.  show  that  with  two  processors  on  a  general  graph,  the 
problem  is  still  NP-complete.  Furthermore,  it  the  graph  is  a  tree  and  the  number  of  processors  is 
unlimited,  the  problem  is  also  NP-complete. 
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Chapter  2 

DEFINITIONS  AND  NOTATIONS 


1 1 


2.1  The  Model 

lor  the  balance  ot  this  work.  I  consider  a  multiple  task  job  to  he  modeled  as  a  directed 
acyclic  graph,  or  dag.  In  this  graph,  each  node  represents  a  task,  or  part  ot  the  overall  job  to  be 
computed,  and  all  tasks  require  equal  computation  time  Arcs  represent  precedence  constraints 
If  an  arc  < u  ,v  )  appears  in  the  graph,  then  computation  of  node  v  depends  on  the  result  of 
computation  of  node  u  ;  i.e..  node  u  must  be  computed  before  node  v  .  and  the  result  of 
computation  of  node  u  must  be  known  by  the  processor  computing  node  v  .  This  appears  to  be  a 
reasonable  form  of  representation,  particularly  in  regard  to  a  straight  !ine  program,  as  pointed 
out  by  Tompa  [Tompa  1980],  Computation  of  the  graph  is  by  a  number  of  identical  processors 
Now  I  introduce  several  definitions  that  will  be  used  throughout  the  rest  of  the  work  to 
facilitate  discussion  of  scheduling  such  graphs  on  identical  processors  and  the  communication 
cost  incurred  in  so  doing. 

2.2  General  Graphs  and  Multiprocessor  Schedules 

The  first  few  definitions  regard  genera!  directed  acyclic  grapns  and  schedules  for  processing 
them.  We  let  G  =  (V  .A  be  a  dag  with  schedule  and  processor  allocation  N  on  v  processors. 

processor  schedule:  A  processor  schedule  on  k  processors  for  G  is  a  function 
/  ■'  V  — ’  { 1  2 . k  !  x  .V  ( ;V  is  tiie  '-et  of  natural  numbers!  ■-uch  that 

it  here  :s  no  u  —  v  £  V  sucit  that  '  (a  )  ■=  f  ir  ).  .,nd 

•  <  •  r  1  u  vu.  h  that  tu  1  6  \  .  ii  /  =  .<  .  /  /  v  •  =  ~i  ,.*t  '  then  ;  <  n 

proct  >cr  assigned  t-  ■  .  v  -  j  ’  ’or  ome  v  fc  V  then  we  -  .  to  .•  rr.^e  ^  r  :  -  tne 

’"'ores  >r  jssi^’hd;.  .nd  s  sc  nee  u  led  1>  ~r  ve-sec  .t  me 
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proc(v)  lore\er\  v  €  V  . /v  <>cfv  I  denotes  the  processor  assigned  to  v  in  .S’ 

n(P):  l  or  each  pr  ocessor  P .  n(PI  denotes  the  number  of  nodes  in  V  assigned  to  P  in  .S’ 

communication  node:  A  node  v  Z  \  is  said  to  be  a  c  imunication  node  if  there  is  a  node 
w  6  \'  such  that  (v  .vv  )  €  A  and  procfv  )  ^  proc(w  ). 

communication  receiver  A  node  v  €  V  is  said  to  be  a  communication  receiver  if  there  is  a 
noile  iv  6  \  such  that  (vv  ,v  )  €  A  and  procfv  )  ^  procfvv  ). 

communication  to  v:  I  or  ecerv  v  €  V  .  the  communication  to  v  is  the  number  of  nodes  vv  such 
that  ( w  ,v  )  6  A  und  procfv  )  procfvv  ). 

available  node:  At  any  lime  t  during  the  computation  of  a  graph,  an  available  node  is  defined 
to  be  any  node  v  such  that  all  nodes  w  such  that  (v  .w  )  €  A  have  already  been  computed: 
that  is.  if  /  (w  )  =  (t  )  then  j  <  t  . 

2.3  Communication  Cost 

1  present  two  definitions  of  communication  cost. 

Definition  1:  1  c  r  a  dag  G  —  ( \  .A  )  and  processor  schedule  S  .  communication  c<<st  is  defined  to 
be  the  number  of  pairs  u  ,v  such  that  ( u  v  )  6  A  and  procfu  )  proc( v  ). 

Definition  2:  i  or  a  dag  G  =  (1  .A  )  and  processor  schedule  .S'  .  eommunicati<>n  coir  is  denned  to 
be  the  number  of  processor  node  pairs  ( P  .v  )  such  that  processor  P  Joes  not  compute  node  v 
but  computes  at  least  one  direct  successor  o!  v 

Pa  pad  im'tru'u  and  L  liman  [Pa  p.iuimilricu  1  b<S4bj  and  A1  rat  i  el  al  [  \ ;  rat;  i  bSf  ]  both  use 

Definition  1  of  vommunication  in  their  work  1  introduce  Definition  2  as  a  more  practical 

definition  of  c  smmur.ication  cost.  By  counting  each  arc  ‘  u  ,v  )  such  that  prod  u  )  *  proc<  v  1  as 
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v  'minimisation  using  Definition  I  we  may  be  passing  the  same  information  between  j  given  pair 
>1  processors  se. eral  times  I  or  instance,  il 'nodes  u  .v  .  and  w  €  \  such  that  (u  .v  ).  (u  .»v  )  £  -t 
nJ  procU’  )  =  prod  «  )  &  procl u  ).  then  the  result  ol  computing  u  must  be  communicatee)  to 
tne  processor  computing  v  and  »v  .  This  communis  men  is  counted  in  noth  definitions,  however 
IVfr.nion  1  oi  Mimmunicaiion  counts  n  twice.  Definition  2  counts  it  only  or  e.  Definition  2 
s  >unts  nl\  the  number  o!  results  that  must  '•*?  passed  I  rom  one  processor  to  another  and 
assumes  that  the  second  processor  can  save  each  value  lor  us<  in  processing  all  nodes  requiring 
it.  . 

2.4  Trees 

par<v)  l  or  anv  node  \.  .  par(\  '  will  denote  the  parent  ol  v.  In  an  indirecled  tree  T  .  par(v  )  is 

the  node  »»  such  that  >v  is  directlv  above  »•  in  T  that  is  the  arc  (v  ,vv  )  is  in  7’. 

child(v):  For  a  node  v  c/nU(\  1  denotes  ' lie  child  „>f  v  .  that  is.  the  node  »v  such  that  v  = 
pa  r1  n  }. 

rc<  v  i  In  an  ordered  tree,  rci  v  >  denotes  the  rightmost  chin:  ol  the  node  o  . 

Id  v)  In  an  ordered  tree.  .V/\  1  denotes  the  ie!  must  child  '1  tiie  node  v  . 
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CHAPTER  3 

MINIMUM  TIME  AND  COMMUNICATION  SCHEDULING 
ON  TWO  PROCESSORS 


3.1  General  Graphs  on  Two  Processors 

Although  general  minimum  lime  scheduling  on  two  processors  can  be  done  in  polynomial 
lime  [Coffman  1'>72],  [Oabow  l‘)S2],  when  a  communication  bound  is  introduced  the  problem 
again  becomes  more  difficult.  Afrali  el  al.  [Afrali  1  *>S5 1  showed  that  with  Definition  t  of 
communication,  this  problem  is  XP-complete.  Now  1  turn  to  Definition  2  of  communication. 
Whereas  sx  ith  Definition  1  we  count  comniunicati  m  arcs,  in  the  two  processor  case  using 
Definition  2  we  can  count  communication  nodes.  Though  in  general  using  Definition  2  of 
communication  a  node  could  introduce  moTe  than  one  unit  of  communication  if  more  than  one 
other  processor  requires  information  from  compulation  of  that  node,  in  the  two  processor  case, 
only  one  processor  other  than  the  one  computing  any  node  might  need  the  result.  Ergo,  we  have 
the  following  lemmas  concerning  two  processor  schedules  on  a  graph  O'  =  (V  .A  ). 

Lemma  3.1 

If  there  are  nodes  u  and  v  such  that  proefu  1  ^  proof  v  ).  then  an\  node  ,v  such  that  x  is  an 
immediate  predecessor  of  both  u  and  v  is  a  communication  node 

Proof 

If  v  is  an  immediate  predecessor  of  u  and  v  .  then  b\  definition,  (.t  .u  )  €  A  .  and 
t.v  .v  1  €  A  .  If  procf.x  I  -  proclu  1  then  prod.v  )  *  proof v  h  SimiiarK  .  if  precl.v  )  =  proof v  ). 


prod  a  )  ^  prod u  ).  In  any  event,  x  is  a  communication  node. 


Lemma  3.2 
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If  there  are  nodes  u  and  v  such  that  prodit  )  ^  procf'  )  and  there  is  a  node  x  such  that  v 
is  an  immediate  successor  of  both  u  and  v .  then  either  u  or  v  is  a  communication  node. 

Proof 

If  prodx  )  =  procf  u  ).  then  prod  a  )  ^  pri'dr  ).  Consequently .  v  is  a  communication  node. 
Similarly.  if  procf.t  )  =  procf  v  ).  then  u  is  a  communication  node. 

□ 


Lemma  3.3 

If  there  are  nodes  u  and  x  such  that  .t  is  a  successor  of  u  (not  necessarily  immediate),  and 
procfu  )  ^  procf.t  ),  then  some  node  in  the  path  from  u  to  x  is  a  communication  node. 

Proof 

I  use  induction  on  the  length  of  the  path  from  u  to  a  .  Suppose  a  is  an  immediate  successor 
of  u  .  Then  b\  definition  of  communication,  u  is  a  communication  node.  Now  suppose  that 
i  emma  3  3  is  true  for  anv  path  of  length  k  —  1 .  and  the  path  from  u  to  x  has  length  k  .  11  for 
'-ume  node  v  such  that  v  is  a  child  of  u  ,  v  is  on  the  path  from  u  to  a  .  and  procf'  )  ^  proda 
then  u  is  a  communication  node.  If  prod v  )  =  prociw  ).  then  procf'’  )  ^  procf  v  ).  and  the  lengtn 
of  the  rath  from  '•  to  v  is  k  —  1.  Then  by  the  inductive  hypothesis.  some  node  on  that  path  is 
a  communication  noue. 


dorollarv  3.1 


If  irdi-  .  .r.d  v  -uch  .hat  pr-r-ct u  '  —  rr'vf  ••  1  hu'.e  a  common  succe.--sc  r  a  'hen  at  ---si 
•"e  node  n’t-  rat".  ‘  r ' m  to  a  or  ..'.at  .  rcm  '■  to  r  is  a  communication  nc  c 
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Proof 

This  follows  directly  from  Lemma  3.3.  If  procft/  )  =  procLt  ).  then  a  node  on  the  path 
from  v  to  v  is  a  communication  node.  Similarly,  if  proc(v  )  =  procLr  ).  then  a  node  on  the  path 
from  u  to  x  is  a  communication  node. 

□ 

Corollary  3-2 

If  nodes  u  and  v  such  that  proof u  )  procfo  )  have  a  common  predecessor  v  .  then  at  least 
one  node  in  the  path  from  .v  to  u  or  in  that  from  x  to  v  is  a  communication  node. 

Proof 

This.  too.  follows  directly  from  Lemma  3.3.  If  procf.v  )  =  procfu  ).  then  a  node  on  the  path 
form  x  to  v  is  a  communication  node,  otherw  ise  a  node  on  the  path  from  x  to  u  is  a 
communication  node. 

Corollary  3-3 

Suppose  there  are  nodes,  it  .  v  .  w  .  and  v  such  that  v  anti  w  are  in  disjoint  paths  lrom  u  to 
t  .  If  v  and  vv  are  computed  by  different  processors,  then  to  compute  the  subgraph  defined  by 
all  paths  from  u  to  x  requires  a  minimum  communication  cost  of  2 


Proof 

He  Coroilarv  3.1  either  the  path  from  i  to  \  or  the  path  from  w  to  v  has  at  least  one 
n  mmunication  node  and  by  Coroilarv  3  2  either  the  path  lrom  u  to  v  or  that  from  u  to  »v  has 
a  communication  node  Then  the  a  hole  structure  has  at  least  two  communication  nodes,  fienc » 
communication  cost  is  at  ieast  2 


m 


•  -  '  •  V  -J-  V  •  -  V 


c'  */  **’  ».*  *  '  v'  \ 


I  V  \ 


1  7 

\ow  I  am  reads  to  introduce  the  problem,  i'rom  this  pci n t  on.  whenever  1  speak  ol 
voir  m un ication  1  am  referring  to  Definition  2  of  communication. 

Two  Processor  Scheduling! 2PS) 

INSTANCE:  Given  a  dag  G  =  (  V  „■!  )  and  integers  c  and  t 

QUESTION:  ('an  G  be  scheduled  on  two  processors  within  time  /  and  communication  c  n 

Theorem  3.1 

2PS  is  \P-complete. 

Proof 

To  show  that  2PS  is  \P-complele  1  reduce  3SAT  to  it-  Recall  that  in  an  instance  (X  S  )  of 

3S  VF  we  are  given  a  set  X  =  {.r x2,  .  .t..  (  of  variables,  and  a  set  S  =|.i,.. s: . j„,  |  of 

clauses  where  each  clause  is  of  the  form  s  =  (:  j.  z,  2.  r  .  such  that  :  ,  —x  or.?  forborne 
\  6  ,V  and  v  i  .j  .  1  ^  i  <3.  1  ^  j  <  m  .  Furthermore.  for  each  j  .  the  literals  and 

j  ,  are  distinct,  and  for  no  v  €  X  are  both  x  and  X  in  the  same  clause.  The  question  is 
•a  heiher  the  variables  have  a  truth  assignment  causing  at  least  one  literal  in  each  Cause  to  be 
true. 

hirst,  for  each  .>  €  X  construct  a  four  nod?  gadget  as  in  Figure  2.  The  lour  nodes  for  each 

\  ariable  are  a  .  x.  ,  X  and  b  .  Node  a  >  the  source  node,  with  arcs  .o  the  two  center  nodes  v 
and  X  representing  tne  variable  ana  its  comp  ement  winch  in  turn  have  arcs  to  nooe  o  .  the  sink 
node  Next,  lor  each  v  €  5  construct  a  'ice  node  nidgi:  as  in  Figure  3  Again,  ’here  is  a  source 
nooe  :  .  w  ;tn  arcs  to  the  tlrve  enter  nodes  c  .  .*  ;  and  r  ,.  T~e  three  center  node*-'  of  each 
.....  ?  .  idcit  a-.?  associated  w.th  the  three  'it  ».*  .  the  cl.  us-  Th-y  -,r  turn  have  res  to  •  he 

a  ■>  .  \  tc  ‘  I'.at  ..'?  numbering  ot  •  n.e  and  o  n  -dies  a  ’he  \  idgits  t  vis  .in  - 

i  tr.  .  .  ■  a  r.s  ecu  live  r  er.n-  a'.:  .  gets  iru:  ■>  ugiis ?•..  u,.  n.ed. 
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To  construct  the  graph  for  2PS.  connect  all  of  the  gadgets  followed  by 
:mo  one  column,  identifying  a,  with  1  <t  n  +m.  Consequer 


source 

center 

sink 


source 

centers 

sink 

all  of  the  widgiis 
tlv.  the  source  of 


one  gadget  or  x  idgit  is  identified  with  the  sink  of  the  previous  one  in  the  column.  Add 


.uiiiilion.il  arcs  trom  the  variable  gadgets  lo  the  clause  widgits  as  ii  Hows.  I  el  s  €  .V  be 
1  -  i  i  ’■  If-,  =  a.  i or  some  1  $  ^  3.  1  ^  4  $  n  men  .mroouce  an  arc  1  rom  node 

a,  to  node  r  Similarly,  if  :  -  f,  lor  some  1  $  j  ^  3.  1  ^  k  ^  n  .  then  introduce  arc 
.  c  i.  1  asl .  add  l  he  a  pe\  .  1 A  and  a  res  i .  i  \  c  )  !  or  a  i  1  1  i  ^  ’i  c-  m  as  uell  as  an  au 
d\  b  ;  [hams.  ,\.\  is  a  single  node  \  ;ih  ar-s  ■_  mg  to  alio;  ihe  source,  sink  nodes  in  the 
v  ’lumn  1  ins  completes  ’die  graph  tor  our  instance  <.  I  21’S  \ow  set 
/  -  2/i  4-  .Vo;  +2  c  =  2r:  ~  Zm  . 

\ '  an  example,  consider  the  following  instance  o i  3SAT: 


3S  AT  example 


a  =  •;  v 

> .  A  ;  A  ■. 

.  A  4.  .V  s 

’  '  t, 

11 

i  •  ■'  :■  •>' 

ft,. 

f  1.  . 

I  igure  4  illustrates  the  proposed  graph  representing  this  instance  of  3SAT. 


In  show  that  an  instance  of  2PS  constructed  in  this  way  is  equivalent  to  the  original  3SAT 

•  nstance.  suppose  that  the  instance  of  3SAT  has  a  satisfying  ’ruth  assignment  7’.  F  irst,  assign 

•  ne  apex  tnd  ail  of  the  a  and  b  nodes  to  the  first  processor  1  P  For  each  variable  x  .  .f  a  is 
assigned  the  value  false  in  7  then  assign  a  to  processor  P  for  computat  on:  otherwise,  assign 
r  to  /’  Assign  all  of  the  outer  nodes  in  the  gadgets  to  the  second  pr  'cesser  i(>  i.  In  addition, 
jssicn  to  fj  one  ol  the  center  nodes  m  e..ch  clause  a  u;g-t  associated  with  a  true  literal  If  more 


than  or.f  are  true  then  choose  'ne  arbitrarily  Assigr.  a.i  other  w  ldgit  Centers  to  processor  7'  . 
1  hus.  r-  sessor  rj  •<  a:  signed  exactly  one  ..enter  node  >»f  each  .unable  gauget  and  each  clause 
•'  utcit  and  n  hing  else,  and  the  va l tie  >n  7  ass.cnr.1  to  each  o:  these  is  true.  Flits  processor 
all 's  a  t .  n  v>  a.y  itiifu  w  i*l' m  *.-q  uired  o.  m.:  n  the  1  ol  V  w  me  %a-.  ■  roces'-o  r  /' 

‘  -•te  -  .  i  \  time  !.  ire  .  -  n  .es  tc  a  rk  ct-’t.n  uot  until  '  :ie  •»nt  ra  grarn  -s 


i  =  23:  c  =  18 
Figure  4: 

Graph  for  3SAT  example 


2 

lime  2i .  l  ach  processor  computes  one  gadget  center  at  time  2/  *  1  l  or  all  ;  I  $  j  ^  m 
compute  the  a,  idgu  associated  with  clause  c  bv  the  schedule:  P  computes  a.  .  at  time 
2 n  +  (3/  —  1  >.  The  three  center  nodes  are  computed  by  the  two  processors  at  times  2 n  +3/ 
and  2n  -M3  j  +1).  The  order  of  their  computation  is  arbitrary  .  The  final  node  b  t,  is 
computed  by  processor  P  at  time  2 n  +  2m  +2  which  is  the  requi-ed  limit. 

To  determine  lire  communication  cost,  count  the  communication  nodes  Since  both 
processors  work  on  all  centers,  by  lemma  3.1  each  source  is  a  communication  node.  (  The  apex 
is  not  a  communication  node.)  Also,  each  center  node  computed  by  Q  is  a  communication  node 
since  it  has  an  arc  to  a  sink  that  is  computed  by  /’ .  The  only  other  nodes  in  the  graph  are  the 
center  nodes  computed  by  P  I  claim  that  none  of  these  are  communication  nodes.  1-acT.  ol 
these  nodes  has  an  arc  to  the  sink  ol  a  gadget,  also  computed  by  processor  P .  anil  perhaps  has 
arcs  to  center  widgil  nodes.  The  processor  assignment  was  such  that  Q  only  computes  w  ldgil 
centers  that  have  the  value  in  T  and  tnerelore  ha\e  processor  Q  computing  their 
corresponding  gadget  centers.  Then  no  gadget  center  computed  by  P  ha-;  an  arc  to  any  widgil 
center  computed  by  Q  .  The  widgil  centers  have  arcs  only  to  source  'ink  nodes.  Since  P 
computes  ail  of  the  a  and  o  noues.  no  w  tdgil  center  computed  oy  P  is  a  communication  node: 
ergo  no  center  node  computed  b  .•  P  is  a  communication  node 

Counting  the  communication  nodes  we  find  «  r  n  lor  the  -ou'ces  ar  :  a  *  m  tor  the 
centers  computed  by  Q  .  This  ’  Hals  2r.  -  2m  .  w  hicli  is  our  common  icatcn  bound. 

Nov.  sucrose  the  instance  of  2PS  constructed  above  nas  a  elution  ’!  x  show  that  ihe 
corresponding  instance  of  .;SA  I  ;s  saf>fiubie  I  introduce  another  >nm.: 

Lemma  3.4 

v.  j- .  .me  in  ; he  .  - mru’u.  -r  n;  -c  •  e.  eit.h." 

■''he  „  nr.  e.:.  :: 
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ii  >  A  single  source  or  sink  node  is  computed,  or 

in '  ( )ne  or  tw  o  center  nodes  are  computed;  if  two.  both  are  the  center  of  the  same 
gadget  or  w  idgil. 


Proof 

Since  the  apex  is  the  only  node  of  m-degree  0.  only  AX  is  computed  at  time  1 .  therefore  ( i) 
is  tn\  lal  Since  the  source  sink  nodes  are  ordered,  v  i  .j  .  if  i  <  i ,  then  a  is  a  predecessor  of 
a  .  so  a  must  be  computed  before  a.  .  Node  b,  is  a  successor  cl  all  of  the  a  nodes  and. 

therefore,  must  be  computed  only  after  the  rest  have  been  finished.  Therefore,  no  two 
source  sink  nodes  can  be  computed  at  the  same  time.  Similarly,  each  source  sink  node  is  either 
a  predecessor  or  successor  of  each  of  the  center  nodes  and  therefore  no  source  sink  can  be 
computed  simultaneously  with  a  center  node.  Thus,  no  other  node  can  be  computed  at  the  same 
time  as  any  single  source/sink  node. 

Since  the  center  nodes  cannot  be  computed  at  the  same  time  as  any  other  kind  of  node  (by 
(i)  and  (  i i )  above)  if  one  processor  is  computing  a  center  node  the  other  must  either  be  idle  or 
also  be  computing  a  center  node.  If  two  center  nodes  are  computed  concurrently .  thev  must 
both  be  from  the  same  gadget  or  widgit  since  for  every  gadget  or  w  idgit  y  .  the  center  nodes  of 
every  other  gadget  or  widgil  in  the  column  are  either  predecessors  or  successors  of  all  nodes  in 
v 


Corollary  3.4 

The  minimum  time  for  computing  the  center  ol  a  gadget  is  1.  the  minimum  for  a  vvidgil 
•.enter  is  2.  and  to  obtain  these  minima  both  processors  must  work. 
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Proof 


Since  each  gadget  has  two  center  nodes,  the  only  way  to  compute  both  in  time  1  is  for  eacli 


of  the  two  processors  to  compute  one  of  them.  Similarly,  since  vvidgils  have  three  center  nodes. 


one  of  the  two  processors  must  compute  two  of  them  causing  the  minimum  time  to  be  two.  If 


one  processor  did  all  three  the  time  would  be  three. 


H\  l.emmma  3.4  and  Corollary  3.4.  the  minimum  time  for  computing  the  graph  is 


(apex) 


+  n  +  m  +1 


(source  'sinks) 


(gadget  centers) 


.  (vvidgil  centers) 


=  2n  +  3m  +  2. 


This  is  the  lime  bound  given  in  the  instance  of  the  problem.  Bs  Corollary  3.4  we  see  that  to 


obtain  the  minimum  time  for  each  center,  both  processors  must  work.  Thus,  in  anv  solution  to 


this  instance  of  2PS.  both  processors  are  active  on  every  center.  Then  by  Corollary  3.3  the 


communication  within  each  gadget  and  widgit  is  2.  hence  the  total  communication  in  the  gadgets 


and  w  idcits  is  21  n  +  ;n  !.  This.  too.  is  the  bound  expressed  in  our  instance  of  2PS.  Then  t lie 


apex  ;anno>  be  a  communication  node  since  all  communication  ;s  used  in  the  rest  of  the  graph. 


('all  the  processor  computing  the  apex  F .  Since  the  apex  is  not  i  comm  unication  node. 


■urces  and  sinks  are  also  computed  by  P  since  lor  all  1  <  t  ^  n  -r  m  the  graph  has  .. res 


.4.Y  .  a  ’as  >v  !i  is  arc  :. (A  b  ....  ).  ‘since  both  processors  .me  acl  '.e  on  ever  ten'er  dl  source 


de-  are  nm  mu-:,  at. on  .roues.  \!so.  e.:v  n  «.?-  •  n^.e  •'rocessed  bv  Q  is.;  c  ■'mm  an  cat  :on  node 


sir  e  it  ha-  jp  arc  »o  a  ns  comr  ' ed  b  .  /’  ?  totai  '  '  .:e  ,-ve  Vc> 


•  ne  mm  m urn 


l  >  „  enter  .s  2(  n  -  ~i  since  w  e 


■i  .veil  utv  more  comm  <nicat !•' 


«\SV'V*  -  V 


none  of  the  centers  computed  by  P  is  a  communication  node. 

Nona,  make  the  truth  assignment  as  follows.  We  know  that  processor  Q  computes  either 
v  or  Xj  but  not  both  v  j  .  1  ^  j  ^  n  .  If  Q  computes  .vy  .  then  assign  the  value  true  to 
variable  .v  ;  if  Q  computes  node  X,  .  then  assign  jalse  to  variable  x ,  .  Since  gadget  centers 
computed  bv  P  are  not  communication  nodes.  Q  must  compute  the  gadget  center  predecessor  of 
every  vv  idgil  center  computed  by  Q  .  Consequently,  with  our  truth  assignment,  each  clause  has 
a  irue  literal  corresponding  to  the  widgit  center  computed  by  Q  .  Since  Q  computes  at  least  one 
center  node  in  each  clause  vv  idgit.  each  must  have  at  least  one  irue  literal,  and  the  truth 
assignment  satisfies  (  A' .  S  ). 

□ 

3.2  Limited  In-degree  Graphs  on  Two  Processors 

If  w e  assume  that  our  graph  represents  a  problem  vv ith  fine  grain  parallelism,  i.e..  each 
node  represents  an  elementary  operation  such  as  addition,  it  makes  sense  to  consider  graphs  with 
in-degree  limited  to  two  since  most  elementary  operators  are  either  unary  or  binary.  L'sing  the 
elements  of  the  previous  construction  with  one  change.  1  shove  that  scheduling  a  graph  of  m- 
degree  2  in  minimum  time  and  communication  is  also  an  NP-complete  problem.  To  do  so.  1 
introduce  a  .new  subgraph  structure,  the  1  1-node  modified  widgit.  or  moduli  which  is  pictured  in 
figure  5.  Label  the  nodes  in  the  mcdgil  A  through  A'  .  starting  with  the  source  and  labeling 
from  left  to  right  on  each  level  (in  Figure  5).  \odes  B  .  C  .  and  D  all  have  arcs  coming  to  them 
from  node  A  .  Now  let  nodes  B  through  J  be  the  center  nodes  w ith  B  .  C  .  and  D  being  the 
crucial  centers.  Node  A  has  in-degree  <>.  the  crucial  centers  and  nodes  /  and  J  have  in -degree  1 
whereas  all  other  nodes  in  a  modgit  have  in -degree  2. 


must  he  idle  during  computation  of  each  of  them  I  he  other  nine  nodes  require  lime  at  least  5 
to  Ire  computed  on  two  processors,  so  the  whole  modgit  requires  time  7.  figure  6  gives  an 
example  of  a  schedule  completing  in  time  7  to  complete  the  proof. 


Lemma  3.6 

To  compute  a  modgit  in  minimum  lime  on  two  processors,  each  processor  must  compute  at 
least  one  of  the  crucial  center  nodes. 

Proof 

To  maintain  minimum  time,  a  processor  can  be  idle  for  onlv  one  time  unit  other  than  times 
1  and  7  After  node  A  is  computed,  nodes  R  ,  C  and  D  are  the  only  available  nodes, 
f  urthermore,  no  other  node  becomes  available  until  both  R  and  D  have  been  computed.  If  all 
three  of  the  crucial  centers  are  computed  by  the  same  processor,  then  the  second  processor  is  idle 
during  computation  ol  both  R  and  D  .  w  hich  renders  minimum  time  unattainable 


lime 

r 

1 

A 

id  le 

•> 

R 
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D 

id  le 
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Figure  6: 

A  process,  r  s«.nedule  lor  a  Modgit 


Lemma  3. 
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Computation  of  a  moduli  :n  mini  mum  time  on  rxc  processors  requires  communication  cost 
at  least  2 

Proof 

I  a  :  1 1  shoes  this  b\  counting  communication  nocies.  B\  I  emma  3  6.  at  least  one  ol  the 
^riiiial  is  computed  b\  each  processor  si'  one  pr  vessor  computes  one  crucial  center  and 

: ne  other  computes  two.  Then  be  I  emma  3  1 .  noile  \  is  a  communication  node  We  no*  ha  .e 

l  .<  o  c aces 

Case  I:  iJol.h  :  anu  D  are  computed  be  the  same  processor 

Without  loss  o!  ^eneralitc  let  /’  compute  ft  and/)  Node.)  must  be  c  'mnuted  at  lime  1 
Both  nodes  ft  and  />  must  be  computed  be! ore  either  /  or  F  tan  be  computes  I  herelore 
neither  /•.  nor  l  can  be  computet!  bet  ore  time  l  svr.ee  /’  mist  use  times  7  and  3  to  compute 
nodes  ft  ant!  /)  !  r  the  mot) oil  t>  -v  tomputed  in  •  ne  '  nodes  i!  and  <*'  must  then  he 

t  -r.i ruled  at  time  <  /  and  J  at  •  .me  *<  and  n  ue  A  ,r  i  me  *  I'her  re  node  o|  eat  n  pair 
•  !:  ./  (■  ;’/  and  •  /  J  1  mii't  >e  c  nipided  "  /’  .  id  'lie  other  'v,  <j  auk"  res „ ! •  s  n  jl!  >) 

the  .rut  a.  c-nv  n  »«)es  •vine  com  mun.c.d!  'r>.  n  Mrs  <  h  j  emma  3  !  since  t  .k  ne  has  •  a  o 
m medial-  - .:*'■«  re  .  m r a 1  ed  - .  *Ier--r  irs-r-  S.milari’  i  emma  3.2.  either  /  rJ 
"mo’  -e  a  c  I'munuat..  ~  •  ae  s.n.i*  A  a  d:-ei*  success*  r  I  both  ol  tnem  I'hen  there  are  at 


2S 


IK  l.emma  3.1  node  C  is  a  communication  node,  and  by  Corollary  3.3.  at  least  one  of 
nodes  G  .  //  .  /  .  and  J  must  be  a  communication  node. 

Case  2.1.1:  prod/:' )  =  proc(/'  ) 

II  prod F  )  =  prodG  ).  then  prod/-’ )  ^  prod//  )  j.nd  vne  cersa.  In  either  case  one  of  nodes 
h  and  /  is  a  communication  node  Then  the  communication  nodes  are  \  .  C  .  .  nd  at  least  one 
each  ol  the  sets  {/?./)  1 .  {/;./■'!.  and  {G  .//./.  V  |. 

Case  2.1.2:  Nodes  E  and  /•'  are  computed  by  different  processors. 

IK  1  emma  3  1.  \xith  u  —  F  .  v  =  F  .  and  v  -  B  and  D  .  both  B  and  D  are  communication 
nodes  Then  the  communication  nodes  are  \  .  B  .  C  .  D  .  and  one  of  the  set  IG  ,//././  I. 

Case  2.2:  Nodes  (J  and  //  are  computed  by  Processor  P. 

Processor  Q  must  compute  at  least  four  of  the  nodes  B  through  J  .  Consequently.  Q  must 
compute  at  least  three  of  the  nodes  F  .  F  .  I  and  J  .  since  it  computes  only  one  of  the  other  nodes 
•  B  1  In  particular.  Q  must  compute  at  least  one  immediate  predecessor  of  G"  or  //  .  and  at  least 
ore  :  •  .ne:r  immediate  successors.  Then  the  communication  nodes  are  A  .  and  at  least  one  of 
e.u  h  ’  ’  he  pjirs  *  B  .  C  ).  (  F .  h  1,  ( G  .  //  1.  and  ;  /  .  J  ) 

Case  2.3:  Moth  nodes  G  and//  are  computed  bv  Processor  (J 

IK  ■ .re  deiimtion  of  communication.  C  is  a  communication  node 

Case  2.3.1:  Nodes  /-  and  F  are  computed  by  the  same  processor 

I  rci  ncte  that  it  proc'/i  1  =  prod/-'  1.  ’.her.  prod/.  )  =  prod/-'  =  Q  because  otherwise  >  e 
earn.  :  ach.e-.  e  the  time  bound  Thus,  nodes  B  /-.  F  G  ,  and  //  are  all  computed  be  Processor 
<J  i  her.  Processor  /’  must  compute  bc>th  i  and  J  Then  communication  nodes  are 
I  t  !)  and  // 


Case  2.3.2:  Nodes  /:'  and  l  are  computed  by  different  processors 

Again  by  Lemma  3  1.  both  P  and  D  are  communication  nodes  Then  the  communication 
nodes  are  A  .  P  .  C  .  D  .  and  either  or  /’ . 


In  any  exenl.  communication  of  at  least  5  is  incurred  and  thus  the  Lemma  is  prosed. 


We  are  now  ready  lor  the  statement  of  the  problem 
Two  Processor  Scheduling  of  Graphs  with  In-degree  Limited  to  2  (2PS-2) 

INSTANT  E:  Given  a  directed  acyclic  graph  G  =  (\  ..1  )  with  maximum  in-degree  2  and 
integers  ;  and  c 

QUESTION:  ('an  O  be  scheduled  on  two  processors  in  tune  no  more  than  t  with 
communication  cost  no  more  than  c 


Theorem  3.2 


2PS-2  is  \P-romplete. 


Proof 

1  wii!  show  this  using  a  proof  paralleling  that  used  lor  21'S  Recall  that  I  reduced  3-S.AT 
to  1  hat  pro biem 

To  reduce  3-SA  L  to  2PS-2.  again  construct  a  4  -  node  gadget  for  each  of  the  n  variables  (as 
be! c reh  I  or  eucn  cl  the  m  clauses,  construct  a  modgit.  I  o  make  the  entire  grapn.  again  chain 
a  •  I  :  the  gjo.get'  nd  mod  gits  n  a  c<  lumr..  noweser.  .r.  ns  .jse  :o  no'  equate  the  nk  ol  one 
s  ’  t  ;i  ‘  tie  our.  ,he  next  !  nai  ;s.  !or  -.ich  I  '■  ■  .  •;  .ii’i1  ii  .ix  !i  r  1  o  c 


■  r  'em  -r-  uh.  i'  s  ;  • 


•I  “W 


:i  1  o  *;  ■?:  r  the  ciat.se  mi  .ids.  ,c  'ere  L  .s 
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more  than  2.  and  the  sources  of  all  gadgets  anil  modgils  have  in-degree  no  more  than  1.  as  do  the 
crucial  center  nodes  of  the  modgits  Now  add  one  new  node,  the  subsink  .S'.S  anil  the  arc 
(A  ..S'.S  ).  Again  add  the  apex  A  V  .  with  arcs  from  AY  to  all  of  the  source  nodes  (increasing 
their  in-degree  to  no  more  than  2)  and  a  ■  ar^  to  .S.S  giving  it  in  degree  of  2.  In  each  modgit. 
associate  one  of  the  crucial  centers  a  ith  each  literal  in  the  clause,  that  is  B  represents  c  j.  C 
represents  r  and  D  represents  j  lor  all  i  .  1  ^  m  \ext.  as  in  the  case  with  2PS. 

connect  the  gadgets  to  the  crucial  center  nodes  of  the  modgits.  II  =  .v,  for  some 
l^i  S;  m  .  1  ^  k  ^  3.  1  ^  j  ^  n  .  then  add  arc  ( v  ).  Similarly,  if  ::i  -  X ,  for  some 

l^i  $  m  .  1  ^  k  ^3.  1  ^  j  ^  n  .  mid  arc  ta  f  where,  in  both  cases.  :ti  means  the  node 
representing  in  the  modgit.  Now  each  crucial  center  node  has  one  additional  arc  going  to  it. 
bringing  the  in-degree  to  2  Let  /  =  2n  +  2m  +  2.  c  =  2/t  +  5m  As  an  example,  recall  3SAT 
example  Lor  the  limited  in-degree  case,  the  graph  looks  something  like  Ligure  7.  Note  again 
that  this  is  the  minimum  possible  value  for  r  since  each  gadget  must  be  totally  computed  in 
time  nc  less  than  3  before  the  next  can  be  started,  and  similarly  w  ith  the  modgits.  in  time  no 
less  than  7.  The  apex  and  subsir.k  account  for  the  other  two  lime  units.  Similarly,  c  is  the 
minimum  possible  communication  within  that  time  since  each  gadget  requires  communication  2 
to  achieve  minimum  lime,  and  by  Lemma  3.7  each  modgit  requires  communication  5. 

Now  suppose  there  is  a  satisfying  assignment  for  the  3-SAT  problem.  Then  the  graph  can 
be  scheduled  within  the  time  and  communication  bounds  as  fellows.  As  in  2PS.  let  P  compute 
the  apex  and  all  source  and  sink  nodes.  Furthermore,  if  a  variable  v  is  assigned  true  in  the 
satisfying  assignment,  then  P  computes  node  X  while  Q  computes  node  ,v  .  Otherwise.  P 
computes  node  v  and  Q  computes  node  X .  Again,  pick  one  true  literal  Irom  each  clause.  L  the 
literal  picked  for  a  particular  clause  is  :  -.  ,.e  .  it  corresponds  to  node  C  .  schedule  the  clause  as 
in  Figure  ,V  Communication  within  the  modgit  is  5  with  communication  nodes  A  .  B  .  C  .  D  . 
and  /  If.  en  the  other  hand,  the  chosen  node  is  P  or  D  .  schedule  the  modgit  as  in  Figure  °. 
Ajain.  communication  w  ithin  the  modeil  is  5  w  ith  communication  nodes  A  .  B  .  C  .  D  .  a  no  J 
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Figure  8: 

Schedule  if  C  is  "chosen"  lor  Processor  Q 
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Figure  9: 

Schedule  if  B  or  /7  is  “chosen"  for  Processor  Q 


Because  all  center  gadget  nodes  computed  bv  P  go  to  crucial  center  modgit  nodes  computed  by 
P  no  gadget  center  computed  by  P  is  a  communication  node,  although  all  those  computed  b\  Q 
are  .4s  in  Theorem  3.1 .  the  graph  is  scheduled  by  first  computing  the  apex  and  then  Peeping  P 
constantly  active  while  Q  works  one  center  gadget  node  and  4  center  modgo  nodes  for  each 
variable  and  clause  respectively. 

Conversely,  suppose  that  the  graph  can  be  scheduled  w  ithin  time  3u  +  7m  +  2  and 
communication  2n  +  .  1  show  that  the  corresponding  instance  of  3S4T  is  saiisliable  4s  1 

pointed  out.  for  this  graph  this  is  il.e  mimm.-m  possible  time  and  the  minimum  communication 
lor  that  t.me.  Then  each  gadget  incurs  communication  2.  each  mcdgn  incurs  communication  5. 
arc  neither  the  apex  nor  anv  o(  the  sink  nodes  can  be  communication  nodes  Then  A  X  .  all 
sources,  all  sinks,  and  SS  must  be  computed  bv  the  ^ame  processor:  without  loss  o|  general :tv. 


P 
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let  ;t  he  /’  Acam.  the  gadget  centers  computed  by  /’  unn.'t  be  communication  nodes 
(onsequentk  .  F  must  compute  ever'/  crucial  center  modgu  node  .v  hose  immediate  preiieeessor 
gadget  center  is  computed  b\  F  IK  l  emma  3  (>.  however  at  least  one  crucial  center  ol  each 


modci  l  must  he  computed  h\  Q  .  Since  no  center  gadget  node  computed  bv  /'  is  a 
ci mmurucjlion  node,  the  (gadget)  immediate  predecessor  ol  the  crucial  center  computed  be  Q 
must  also  he  com;  uted  by  (J  (see  the  end  of  the  pret  k  us  prod  ).  Again  make  the  follow  mg 
truth  assignment.  If  Processor  (]  computes  node  a  for  the  variable  a  .  ass  gn  trve  to  variable  a 
others  ise  assign  the  value  /of ve  to  v  .  As  before,  this  completes  the  proof,  since  the  literal 
as,  mated  w  it h  the  modgit  crucial  center  computed  by  Q  in  each  case  roust  have  the  value  true. 


CHAPTER  4 


SCHEDULING  COMPLETE  BINARY  TREES 
ON  TWO  PROCESSORS 


1  lu  [ '.  '*6 1  ]  designed  a  polynomial  time  algorithm  for  scheduling  computation  trees  in 
minimum  time  regardless  of  the  number  of  processors.  whereas  Afrati  et  al.  [Afrali  1  9S5 ] 
show ed  tin  t  to  minimize  communication  within  that  time  f rame  is  an  .YP-complete  problem. 
Moreoc  er.  1  have  shown  t  hat  scheduling  a  genera!  graph  on  two  processors  in  minimum  lime 
and  communication  (within  the  time  frame)  is  an  XP-conipleie  problem.  How  difficult  is  it. 
then  to  schedule  a  tree  on  two  processors  in  minimum  time  with  minimum  communication 
cost0 

1  will  address  only  indirected  trees.  For  these  trees,  since  each  node  except  the  root  has 
outdegree  1.  Definitions  1  and  2  of  communication  cost  are  equivalent.  In  an  effort  to  gain  some 
insight  into  the  probiem  of  scheduling  compulation  trees  on  tw'o  processors  in  minimum  lime 
and  communication  we  study  special  kinds  of  trees,  specifically  complete  binary  and  ternary 
trees.  Chapter  4  investigates  complete  binary  trees,  w  hereas  Chapter  5  examines  complete 
ternary  trees. 

Complete  binary  trees  are,  in  fact,  quite  easy  to  schedule  in  minimum  time  and 
communication  on  two  processors. 


Theorem  4.1 


The  complete  binary  tree  of  height  k  can  be  scheduled  on  two  processors  in  time  2‘ 
w  ith  communication  cost  of  1.  Furthermore,  computation  of  requires  at  least  time  2*  and 
communication  cost  1  to  achieve  this  time 


Proof 


I  irst  consider  optimalm  .  Observe  that  /?.  has  2'  r‘  -  1  nodes.  The  time  lo  compute  il  on 

2‘  -  1 

r*o  processors,  then .  must  be  at  least - - - 

To  determine  the  minimum  communication,  suppose  processor  Q  computes  the  root  a  .  and 
processo-  /’  computes  some  other  node  r  €  /?,  .  1  his  must  be  true  lor  some  node  v  since  to 

optimize  time  each  processor  must  compute  hall  of  t lie  non-root  nodes.  By  l  emma  3.3.  at  least 
one  node  on  the  path  from  v  to  .v  is  a  communication  node  and  communication  is  thus  at  least 

1. 

I  lie  schedule  for  computing  /?.  in  optimal  time  and  communication  is  simple.  I'cr  each 
node  v  .  let  prociv  )  =  P  if  v  is  in  the  right  subtree  of  the  root.  proc(v  )  =  Q  otherwise. 
Processor  (2  computes  the  root.  This  guarantees  communication  of  1  since  rc(root)  is  the  only 
communication  node,  liach  processor  computes  the  nodes  in  its  subtree  one  at  a  time  from  the 
iel  imost  lowest  node  to  the  root  ol  the  subtree.  Since  each  subtree  has  21  —  1  nodes,  this  part 
of  the  computation  requires  time  2*  —  1.  At  time  2‘  Q  computes  the  root. 
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CHAPTER  5 

SCHEDULING  COMPLETE  TERNARY  TREES 
ON  TWO  PROCESSORS 


5.1  An  Upper  Bound  for  Com  .unication  Cost  (Schedule) 

Comp  ete  '.ernarv  trees  oiler  more  of  a  challenge  and  consequently  more  insight  into  the 
o\eral!  difficulty  of  scheduling  general  trees  on  two  processors  than  do  binary  trees.  Since  each 
internal  node  has  an  odd  number  of  children,  we  cannot  minimize  communication  by  simply 
splitting  the  tree  dow  n  the  middle  as  w  ith  binary  trees.  Instead.  I  present  upper  and  lower 
bounds  on  u  mmunicalion  for  computing  a  complete  ternary  tree  in  minimum  time.  I  hese 
hounds  differ  only  by  an  additive  constant. 

—  ■) 

first  define  the  function  .V  ( j  )  to  be  - - ~ - .  This  is  the  number  of  nodes  in  a 

complete  ternary  tree  of  height  j  . 

Theorem  5.1 

l  el  f  be  the  complete  ternar;.  tree  of  height  k  ^2  and  let  h  be  the  largest  integer  such 
that  k  5  it  +  ,V(/j  )  Then  T  can  be  scheduled  in  minimum  time  with  the  communication  cost 
at  most  '<  —  h  1 . 


Proof 

I  et 


/.=;•-  /  -  A  h  ^  0 

By  definition  of  h  .  k  <  h  *  1  A  (/’i  +1  ).  hence  L  <  3”  *1  +  1.  Now  picture  the  tree  T  as  in 
1  iqure  10  Each  T.  has  height  '<  —  1  —  j  and  the  height  of  each  center  node  C  ( j  )  is  k  —  j  .  l  et 


b  =  \Ui  )  + 


1.  and  construct  the  schedule  in  the  following  way.  Let  processor  P 


K 

J * 


38 


compute  /’.  _j( middle  1  anti  subtrees  T  (It'll  )  for  0  ^  j  ^  k  —  h  —  1.  Processor  Q  computes 

subtrees  T  (right  )  lor  0  ^  j  ^  k  -  It  —  1.  <)l  the  center  nodes,  let  Q  compute  C(  j  )  for 

1)  ^  j  ^  b  and  1'  compute  all  others,  i.e..  C  ( j  )  for  b  <  j  $  k  —  h  . 

l  or  the  timing,  first  notice  that  \TU(R)\  > 

1  )  I  et  processors  /’  and  Q  compute  nodes  in  their  subtrees  Irom  the  inside  out.  that  is.  starting 
w  ith  7\  _,(tc  ).  u  €  {right  .  left  .  middle  1  and  working  out  toward  T0(u  ). 

2)  Both  processors  continue  cor  puling  these  subtrees  until  processor  (2  has  only  kg  nodes 
remaining  in  those  trees.  These  nodes  are  all  in  7\, (right  ). 

3)  Processor  P  will  now  have  A’  (h  )  +  kg  nodes  remaining  to  be  computed  in  T  {i(left  ).  None 
of  iheC  nodes  will  yet  have  been  computed. 

4)  In  the  next  kg  lime  units.  /’  computes  the  bottom  C  nodes  while  Q  computes  the 
remaining  nodes  in  7',. (right  ). 

5  I  The  remaining  uncomputed  nodes  are  all  in  T„(lcft  )  and  in  the  center  chain  of  C  nodes. 
Processor  P  has  a  total  of  S(h  )  +  kg  nodes  remaining,  w  hile  Q  has  .V(7i  )  +  kg  +  1  nodes 

lei  l  to  be  computed.  In  the  next  .V  (  j  )  +  ~  time  units,  let  both  processors  work  on  their 

respective  nodes.  Since  none  of  the  nodes  assigned  to  processor  Q  is  a  predecessor  of  any  of  the 
nodes  assigned  to  P .  and  since  C  (0)  is  the  only  node  assigned  to  processor  Q  that  is  a  successor 
to  those  assigned  to  P .  processors  P  and  Q  can  w  ork  simultaneously  on  these  nodes  for 

\[  I 

A  (/i  '  +  I  —  time  units  ( i.e..  until  processor  P  is  finished )- 

! 2 ! 

h)  Let  Q  finish  the  last  one  (or  if  L  was  odd.  two)  C  node(s). 


3r 


I”.  identic  .  since  /’  computes 


A'  ( ;•  )  -  1 

_ 


and  (J  com pu  les 


.V  ( k  )  -  1 

_  . 


of  the  internal 


nodes,  both  ol  the  processors  are  aclixe  at  all  times  as  much  as  possible,  making  lime  minimal 
\ow  examine  the  communication  cost  incurred  in  this  schedule  Because  each  ol  the  right  and 
Sell  'Ubtrees  of  all  center  nodes  is  computed  entirely  by  one  processor,  the  onlv  communication 
nodes  in  the  Schedule  are  center  nodes.  The  communication  to  each  C  (  j  )  where  0  ^  j  ^  b  is 
1  Mnce  Q  computes  both  C  (  j  )  and  t'.co  of  its  children:  C  (  j  +  1  1  and  the  (root  of  the)  subtiee 
T  (right  ),  xc  hereas  /’  computes  the  thirci  child  ol  C  ( j  ).  Communication  is  also  1  to  each 


!  L 


C  (  i  l .  v  / .  t>  +1  <  j  ^  b  +  j  — 
1  j  -> 


since  processor  P  computes  C(  j  )  and  the  (root  uO  subtree 


T  Ue/i  I  as  well  as  C  i  j  +  1 )  (or  T  (middle  )  w  hen  j  =  b  + 


L 


).  w  hiie  Q  computes  the  third 


child  of  C  (  j  ).  The  communication  to  C  (b  +  1 ).  computed  by  Q  .  is  2  since  Q  computes 
7.  .fright  )  w  hereas  P  computes  the  other  two  children  of  C  (b  *  1 )  Then  the  total 
communication  cost  is 


b  + 


+  2 


=  Nth)  -  1  -*■  I  -  2  =  '  k  —  >'.  -r  1  as  desired. 


Hence,  if  complete  ternary  tree  T  ias  height  k  -  h  +  .\iii  +  L  .  L  <  .V  *'  +  ]  then  T 
can  'e  scneduled  in  communication  nc'  more  than  k  —  h  -1-  I. 


\  -lice.  too  that  h  <  log vk  I.  but  als.  /•  ^  loads'  —2.  Then  we  J.earh  has e  a  scl.edut 
".at  com.  ties  a  cor  r  ele  ter  tar.  '  -°e  m  m.n.mt.m  t.me  1  :th  c  'm  m.  umcj '  r  m  ".'re  ;n..r 


'C 
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5.2  A  Lower  Bound  for  Communication  Cost 


Having  established  an  upper  vund  on  communical it  n  cost  in  scheduling  complete  ternary 
trees.  I  now  introduce  several  leir.ma.s  anil  facts  abmil  complete  ternary  trees  and  their 
schedules  to  aid  in  determining  a  lower  bound. 

I  irst.  notice  that  un\  schedule  that  minimizes  time  must  partition  the  non-root  nodes  into 
t  a  o  sets  w  hose  M/es  ddler  by  at  moM  1  since  hot  h  processors  must  do  t  he  same  amount  of 
.cork  jnd  must  place  the  root  into  one  of  these  sets.  Call  an,  partition  of  the  nodes  into  two 
such  sets  a  />»'i  -pet  partition. 

5.2.1  Definitions  and  Notations 

Again  note  that  since  see  are  discussing  indirected  trees,  a  node’s  children  and  descendants 
.are  its  predecessors  (rather  than  its  successors)  in  the  partial  ordering  described  by  the  tree. 
Similarly,  the  ancestors  of  a  node  are  its  successors. 

f  or  any  complete  ternary  tree  T  of  height  k  w  ith  proper  partition  .S'  =  f  P  Q  ).  define  the 
tollowmg  (note  that  all  set  membership  is  with  respect  to  .S'  ): 


For  every  node  v  in  T .  let  O'  (u  i  =  R  6  \P .  Q  1  such  that  v  e  A’ 

An  edge  I  v  .  w  )  is  broken  if  G  (r  )  =  P  and  G  (w  )  =  Q  or  \ ice- versa.  Fdge  ( v  .  tv  )  is 
broken  to  level  i  if  node  w  is  at  level  i  (r  is  at  iecel  t  +  1  ) 

Define  comrulSI  to  be  the  number  of  edges  broken  in  the  proper  partition  .S'  \  proper 

partition  S  w  it h  a  minimal  comml.S'  )  is  an  optimal  partition 

In  a  s.miiar  vein,  comm  (.S'  .  i  )  denotes  the  total  number  ol  edge--  broken  \v  a.:  lev  els  at  or 
a  bo '.  e  !e'  el  ; 

I  et  /’*;  ’denote  the  number  M  nodes  in  P  at  level  i  Similaric  .  let  <J  1 1  '  denote  the 
n  amber  ol  nodes  in  (J  at  level  i  Then  O  $  fv  ).  bh  i  ^  3  -'i  .  1  $  :  $  i  jl 


all  U 


yij  imijm 


'A  ^ '-*  '• 


/*  O'  t,"V  V*  '.vjv.VT.v.v  v  -r  ~*  m  -  -v  i 


w 

:*v  *> 

•V  -• 

v  » 

s'* 

V 

« & 

3 

■  V  •*. 

A,*  v 


y « » 

lw% 

<«« 


P$ 


£  * 


I 


L« 


A’-* 

i 
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V 
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I  el  j/  1  :  -  /’  i  i  —  ij  ' :  1  ,i(J  I ;  -  (J  ■  .  )  —  /’  i  t  !v.  Jell n 1 1  u  n  !  , > r  e  \  er  .  :  K>th  dPU  > 

me  i:<J  (:  jre  v.d  integers 

II  (»  —  /’  then  C >  den.  tes  fj  ii  (,  =  (J  ;hen  (»  denote^  /’ 

I  o r  i. j  E  1 P  .  (J  ! .  n nd  ()  «:  c  ^  V  .!  ( i  a  i — *  .a 


<  c: .  er  is  j  •  'ile  i  m  (i  Mjiii  that  e'atth  a 


>1  ■'  s  v  hil.lren  o  e  in  (> 


111  ;  s  a  n  1  ( >  .  e  )  -  rei  e  r.  e  r  !  o  r  s.  >  m  e  ( *  C  !  /’  (J  ! .  a  €  1  ( ) .  1  .  2  .  > !  1  lie 


n  me  ' talus  ol  '  is 


( C  ’  .  a  ' . 


(era  €  {<).  1.  2.  3! .  an  a  receiver  is  a  nolle  t ha.  is  in  the  same  set  as  exact h  3  -  o  of  its 
J'tulren.  Then,  an  a  -receix  er  is  either  a  '  /'  a  )-reo»:\.er  or  alp.c  )-receiver. 

A  (J  i  c ce:i c’-  is  a  node  v  such  that  v  is  a  iQ  a  )  -  receiver  for  some  1  ^  u  ^  3.  Similarly 
1 .  r  a  F  —i  .reiver  . 

"  denote  the  nodes  ol  the  i  'th  level  of  7  v(!  > . v  f  y  ).  and  say  that  v(j)is  anal;)- 

re.e  .  er  !  'r  all  1  ^  j  ^  3  .  then  i  is  an  A-recei\er  level  if  £  a  ( j  )  -  A  . 

.  =i 

A  ■'■•ole  v  su.li  that  tr  1  v  -  F  is  called  a  F-'Uuie  a  node  w  such  that  O'  (iv  )  —  Q  is  a  Q- 

1. O 

5.2.2  Elementary  Observations 

(  .earl  c  a  lower  bound  or  the  number  i|  edges  oroken  in  a  proper  partition  S  of  the  non- 
....  t.H.ev  .1  a  .ompiete  ternary  tree  l  is  a  lower  Is  und  .in  the  .  mi m  unication  required  tor 
rie..  .iting  •  n  minimum  t.me  I  his  is  on.v  a  lower  nr.md  sir.e  there  max  be  no  schedule 
•  -  ^  r.i  m  i;n  it  d  i  ion  'Jt  ai>o  scit!v  no’*  t  ht  ’onto*  nir..m  *  a*  ’.no 

1  ’  =  —  C  ’  -  "  '  *ie:e  e  :  —  .  >. 
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•  i  .  1  <  ■  ^  k  .  dP(i  )  =  -JQ  1 1  !. 

£<//'(<  )  i$  1 

=  i 

1  or  all  A-  .  the  complete  lernarv  tree  /’.  oi  height  k  has  X  (k  )  nodes,  of  which  3‘  are 
'eaves.  Thus,  more  than  half  of  the  nodes  of  T.  are  leaves,  so  P(k  ).  Q  ik  )  >  0.  l:urlhermore. 
if  v  >2.  .hen  m  ).  Q(k  )  2  4 

5.2.3  5'chetlule  Transformations 

first  I  shall  force  the  structure  of  an  optimal  partition  ol  a  complete  ternary  tree  V  of 
height  k  3  into  a  strict  form,  therein  facilitating  analysis.  Toward  this  end  1  introduce  three 
transiernations  Remove3,  Remove2.  and  Elim2.  on  such  partitions.  Although  Remove3  and 
Remove2  are  defined  in  terms  ol  /’  -receivers,  the  transformations  are  sv  mmetrical  for  Q  - 
receivers' 

1 )  Remove3(  v  ) 

Suppose  S  =  (P  .  Q  )  has  at  least  one  (P  .  3  1-receiver  v  .  Then  in  the  following  manner  we 
transform  .S’  to  S'  =(/’*.(?’)  in  which 

i )  v  is  a  (O’  •  o)  -receiver,  and 
ill  comml.S”  1  <  comm(.‘  ). 

Pick  n  leaf  w  in  Q  such  that  w  is  not  a  child  of  v  Then  let 

F  =  1  P  —  { v  ! )  U  lw  |. 

O'  =>Q  -  lw  |  )  U  lv  1. 

S.r.ce  v  is  a  (  P .  3  i-recer. er .  a ! !  of  v  A  children  w  ere  -nodes.  Then  all  of  v  *s  children  are 
Q  -n.  oes.  and  v  is  a  ( C?  -  0)-receiver 

Se  w  consider  comml-S’  ).  The  three  edges  1  rom  r  to  v  s  children  are  no  longer  broken  in 
.S  11  edges  ( par  tv  ).  v  )  and  (  pur  l w  ).  w  )  ^  ere  not  broken  in  S  .  then  they  are  broken  :n  S’ 


Figure  1 1: 

Remove3 


since  v  and  w  have  been  switched  to  Q‘  and  F  respectively.  This  could  introduce  at  most  two 
new  broken  edges.  So 

onirnli'  i  ^  comm(5  >  —  3  2  <  commtS  ). 


.v 

2 


V. 


2)  Remove2(  v 


If  5  has  at  least  one  {P  .  2  .'-receiver  v  v,  hose  parent  r  is  in  Q  .  then  the  following 
procedure  transforms  S  =(/’.  y>  )  to  5"  =  (F  .  Q'  )  such  -hat 


i )  v  is  o  (Q‘ .  1  )-recet  er.  and 

i i )  comir. '  S'  ;  <  comm/ 5  ' 


Again.  '•<.  k  a  ’at  '  n  Q  >  uch  that  is  not  a  child  of  v  Then  ,et 


a 
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In  this  case,  v  is  now  a  (Q‘  .  1  )-receiver  since  one  of  its  children  is  in  F"  w  hereas  the  others 
are  in  Q‘ .  Also,  t  receives  one  fewer  broken  arc.  At  most  one  new  arc.  that  from  w  to  par(w  ). 
mav  be  broken  in  S* .  Consequently. 

comm(S'  )  ^  comm(S  )  —  2  +  1  <  commfS  ). 

\ote  that  both  Remove3  and  Remove2  cause  the  totai  number  of  broken  edges  to  decrease. 
Consequently .  no  optima)  partition  can  have  either  of  the  situations  allow  ing  Remove3  or 
Remove2  to  be  executed. 

3)  Elim2(  v  ..x  ) 

If  S  -  (P  .  Q  )  is  a  partition  in  w  hich 

a)  v  is  a  (P  .  2)-receiver. 

b)  w  is  a  (Q  .  2)-receiver.  and 


1* 


Figure  12: 

Remove2 


c)  par(v  )  €  P :  parf.v  )  €  Q  . 


then  we  transform  S  to  S'  =  (P  .Q'  )  in  w  hich 

i)  v  is  a  (Q‘  .  1 )- receiver, 
li)  v  is  a  (F  .  1 )- receiver,  and 
iii'  comm(5'  )  =  comm(S  ). 

in  the  fo  low  ing  manner. 

Let 

F  =(P  -  iv  ))  U  |.v  I. 

Q'  =  IQ  -  U  I)  U  { v  } . 

The  only  etlges  affected  are  those  into  and  out  of  v  and  x  .  Since  onlv  one  of  v  s  children  is 
in  P .  only  one  arc  to  v  is  broken  in  S'  .  Similarly,  one  of  x  s  incoming  edges  is  broken  in  S' . 
Both  ( par  (v  ).  v  )  and  ( per  (.r  ),  x  )  are  broken  in  S'  .  Then  exactly  4  of  the  S  edges  surrounding 


Figure  1  3: 


\  anil  a  are  broken  in  .V  In  S  each  ol  i  and  \  hail  a  broken  edges  surrounding  ii  which  again 
totals  4.  Since  no  other  edges  a. ere  ailei ;eii  b .  the  transformation. 

t o m m ( .V  )  =  co ni m (  N  ) 

5.2.4  Lemmas  and  Corollaries 

Lemma  5.1  (The  Left  Property) 

We  ian  assume,  without  loss  of  generality,  that  in  an  optimal  partition  ol  a  complete 
ternary  t ree  no  Q  -notie  lies  to  the  left  ol  any  /’  node  at  an\  lev. el 

Proof 

Consider  u  complete  ternary  tree  T  with  optimal  partition  S  =  (P  Q  ).  l  or  a  level  i  .  let 
P*i  )  =  m  .  P(i  +1 )  =  n  .  Then  the  number  of  edges  to  level  i  that  are  broken  in  .S'  must  be  at 
least  3 m  —  n  l.  That  is.  since  the  m  P  -nodes  at  level  i  have  a  total  of  3m  children  at  level 
I  +  1 .  if  rz  <  3m  .  then  at  least  3m  —  n  of  those  children  must  be  in  Q  :  similarly,  if  n  >  3m  . 
then  at  least  n  —  3m  /’-nodes  on  level  i  +  1  must  have  parents  in  Q.  Then  any  partition  must 
have  at  least 

£  3PI  j  )  -  P<  j  +  1)1 

broken  rdges  This  minimum  can  be  achieved  by  putting  the  P(i  )  leftmost  nodes  in  /’  and  the 
Q  ' ;  )  rightmost  nodes  in  Q  at  each  level  i  . 

Corollary  5.1  (The  Simple  Property) 

11  the  Lett  Propertv  holds  in  a  partition  S  -  ( P .  Q  )  of  a  complete  lernarv  'ree  T  .  t  h-*n 

i)  No  level  has  both  a  /’-receiver  and  a  Q -receiv er. 
ii  No  lev ei  has  more  than  one  b  -receiver  vv  here  b  6  !l.2l. 
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I  urtiiermore  no  level  has  both  a  l-receiver  anil  j  2 -receiver 

Proof 

1  ei  /’  be  a  complete  iernarv  iree  with  partition  .S  =  (/’.(_>  )  ha',  ing  the  lefl  proper!;,  . 

Then  lei  some  lev  el  i  have  a  /*  -  receiver.  I.'.amine  ihe  rightmost  P  receiver  »•  .  Since  v  is  a  /’- 
re,-*:  .or  a!  least  one  of  ns  *. hiKJren  u  must  be  a  Q  -node  By  the  I  eft  Property,  all  nodes  to  the 
right  ol  w  on  ie\el  i  +  1  are  in  (J  .  Since  all  (J  -  nodes  on  level  i  are  to  the  right  of  r .  all  of  their 
,  hiidre.n  niust  a  Iso  be  Q  -nodes,  lienee  none  are  (j 'receivers  anti  level  i  has  no  Q  -receiver,  l.rgo. 
no  level  has  both  a  /’  -receiver  anti  a  Q  receiver. 

I'o  demonstrate  (  n1.  let  u  be  a  <  P  .  b  f-reser,  er  on  some  lev  el  i  ,  b  €  '1.21.  Lei  »  be  the 
ie!  invest  Juki  of  v  anti  v  be  the  rightmost  child  of  v  Since  1  ^  b  %s  2 .  tv  is  a  /’-node,  and  a 
;s  a  Q  -notie  Bv  the  Left  Proper’,'.  .  all  nodes  to  the  lei  t  of  «  on  level  ;  -r  1  are  /’-nodes; 
consequently  no  node  to  the  left  of  v  on  level  ;  can  be  a  /’-receiver.  All  nodes  to  the  right  of  x 
on  level  i  +  1  are  Q  -  nodes:  thus.  all  of  the  children  of  each  node  to  the  right  of  v  on  level  i  are 
in  ij  Therefore  anv  /’-nodes  to  the  right  of  v  on  level  t  are  ( P  ,3  )-receiv ers.  and  level  i  has  no 
oth*r  i  /’  .1  !-  or  (  P  2  ‘-receivers.  The  proof  f  or  <*J  .1  I-  and  .2  ’-receivers  strictly  para! le Is  the 
last  argument. 


no 


-is 

partition  has  this  form  TV  f  orce  the  partition  into  the  above  I  rm.  I  use  the  following 
algorithm. 

1  Procedure  FormatPartition 

2  begin 

3  while  there  are  at  least  one  (/’ .  2)  anil  one  ( (J  .  2)-receiver 

abov  e  lev  el  k  —  1  do 

4  begin 

5  i-  :=  the  lowest  (  P  .  2)-receiver  above  lev  el  k 

(that  is.  the  one  closest  to  the  leaves) 

6  x  :=  the  lowest  (Q  .  2  )-receiver  above  level  k 

7  Elim2(v  .  a  );  Replace  S  w  nh  S' 

S  end 

9  S'  :=  -S' 

10  end. 

Lemma  5.2 

(liven  an  optimal  partition  S  =  (P ,  Q  )  of  a  complete  ternary  tree  T  as  input. 
FormatPartition  converts  S  to  S'  =  ( P'  .  Q’  )  as  described  in  (i)  through  (iii)  above. 

Proof 

First,  bv  Lemma  5.1 .  we  can  assume  that  the  Left  Properly  holds  for  S .  The  only  changes 
made  to  .S'  are  Elim2  transformations.  We  know  that  comm(5  )  is  not  affected  by  Elim2(r  .  v  ). 
so  (i)  is  obvious.  I  claim  that  the  Fel  t  Property  still  holds  after  each  execution  of 
Elim2R(  v  .  x  ).  This  must  be  true  if  it  holds  before  the  execution.  Consider  node  v  .  Node  v  is 
a  ( P.  2  .-receiv  er  Thus  exactly  two  of  the  children  of  i-  are  Q -nodes.  Then  all  nodes  to  the 
right  of  rv('  )  are  Q  -ncules,  and  all  nodes  to  the  lei  l  of  lc(v  )  are  P -nodes.  Since  the  partition  is 
optimal  it  has  no  3-receivers,  so  all  nooes  to  the  right  of  v  must  also  be  Q -nodes,  and  all  nodes 
to  its  left  are  P -nodes.  Elim2(  v  .  a  )  converts  v  toaQ  -node.  Since  no  P -nodes  were  to  its 
right,  and  no  other  node  on  that  level  is  changed,  ail  nodes  to  the  right  are  still  Q -nodes,  anil  ail 


-  1 
-  1 


nodes  lo  i is  lefi  are  /’-nodes.  Therefore,  the  I  ell  Property  Mill  holds  on  that  lev. el.  The 
argument  is  sy  m  metric  for  the  lev.  el  of  v  .  I  lence  a  I  ter  each  execution  ol  Elim2i v  .  a  ).  ihe  left 
ant).  by  Corollary  5.1.  Simple  Properties  hold. 

In  order  lo  show  that  FormalParlition  terminates  i  show  that  the  number  >1  limes  the  lest 
at  line  3  is  executed  is  bounded  by  die  height  ol  t lie  tree  I  he  procedure  is  repeateil  onlv  il  both 
(  /’ ,  2  )  receo.  ers  anil  ( Q  2  Lrecetv  ers  are  present  :n  the  part  it  ion  at  the  time  ol  the  lest  If  so. 
the  2-receiver  n  closest  to  the  leaves  of  7  is  eliminated  In  F.lim2(i  .  x  ).  Ail  nodes  whose 
status  is  changed  are  above  the  level  of  «  ['hen  after  Flim2(t  .  \  )  (v  or  a  =  vv  )  is  executed* 
the  lowest  2-receiver  in  the  partition  is  higher  in  the  tree  than  before,  anti  none  ol  the  other 
transformations  introduce  any  2-receivers  at  any  lower  level  Alter  the  i  th  time  through  the 
while  loop,  the  lowest  2-recer.er  is  no  lower  than  level  —  i .  Consequently  .  since  the  while 
loop  terminates  if  no  more  than  one  2-reteiver  remains  in  the  partition,  the  number  of  loop 
iterations  is  at  most  k  —  1.  and  therefore  it  must  terminate. 

Upon  termination,  properties  (i)  and  (ii)  of  Lemma  5.2  hold.  Furthermore,  property  (iii) 
is  the  termination  condition  ol  the  procedure,  so  property  (m)  also  holds.  Consequently .  an 
optimal  partition  can  be  lorced  ml.)  the  fmm  stated  above 

C 

At  this  point  some  observations  about  the  effect  i  f  different  kinds  of  receivers  ;n  an 
optimal  partition  are  helplui.  IV- -  Lemmas  5  J-  through  5.6  and  Corollaries  5  2  and  5.3.  7  .s  a 
comp  ete  ternary  tree  with  proper  partition  S  —  (/’.  Q  1  o!  the  form  dictated  in  Lemma  5.2 

Lemma  5.3 

It'  .as  no  l Q  2  '-reef  .  ’  !e  .  •:•!  it  .  n  <  -  ami  dP  Ui  )  it  1 .  i  ■  er 


'■<> 


1'roof 

11  level  h  has  no  Q  -receivers  at  all.  then  each  child  of  every  Q  -node  at  level  h  is  a  Q  - 
node.  I'hen 

QUi  +  1)2*  3(Q(h  )). 

Consequently . 

P(h  +  1)  ^  MP(h  )). 

Then 

JQUi  +  1)  ^  MQUi  )  >  JQUt  ). 

If  level  h  has  a  (Q  .  1  )-receiver  then 

Q(h  +  1)  =  3(2  (/j  )  -  1 

Pin  +  1)  =  3/M/i  )  +  1. 

Then 

JQ(h  +  1)  =  3Q  Ui  )  —  3P(h  )  —  2 

=  2JQ(h  )  -  2. 

Since  dQ  (h  )  ^  1 . 

JQU>  +  \  )  ^  2dQ  ih  )  -  2 dQ  Ui  )  =  J(2  (A  ). 

By  hypothesis.  these  are  the  only  two  cases  possible. 

□ 

Corollary  5.2 

If  .v  and  T  are  as  described  in  Lemma  5.3.  and  there  are  levels  h  and  g  such  that 
it  -"S  g  $  k  —  1 .  and  dQ  Ui  )  ^  1 .  and  .V  has  no  <0  •  2  1- receivers  at  any  level  i  .  h  ^  ^  g  , 


then  lor  all  levels  t  .  h  S;  i  ^  g  .  dQ  < g  )  2*  dQ  (t  ). 


By  !  emma  5  3  applied  g  —  h  —  i  limes,  JQ(h  )  ^  dQUi  +  1 )  $  ^  <7(2  (g  ) 


Lemma  5.4 


If  there  is  a  lev  el  j  in  7'  such  that  dP  ( j  )  >  O  vv  liile  dP  ( j  —  1 )  <  0.  ihen 


a  )  d  P  ( j  )  =  1 


b  I  dQ  (  j  —  1 )  =  1 .  and 


:)  level  j  —  1  has  a  (Q  ,  2)-receiver. 


Proof 


Bv  definition,  level  j  —  1  can  have  no  3-receivers  and  can  have  ai  most  one  1-  or  2~ 


receiver.  Since  dP(  j  -  1 )  <0  but  dP(  j  +  1 )  >  0.  )  -  1  must  have  a  Q-receiver  v  .  If  v  is  a 


i  Q  .  1  l-receiver.  then 


dP  (  j  )  =  (3P  (  j  -  1 )  +  1 )  -  ( 3C>  ( j  -  1  )  -  1 ) 


=  3 dP(  j  -  T  +  2 


^  -3  +  1  =  -1. 


Thus,  v  must  be  a  ( Q  ,  2)-receiver.  Then 


dPl  j  )  =  ( 3/M  ;  -  1)  +  2)  -  (3C>(;  ~  1'  -  2) 


=  3 dT'  j  -  1  )  +  4 


(-3  +  4  =  1 


'me  '  Ji 


dPi  J  )  =  1.  and  IdP1  :  -3.  s  -dPij  -  1)  =  -1 


..  a i  -  jlent’iv  dO  '  •  -  1  )  =  1 


_  e.  '•* 'e.V.'.-.'-- .V.V.  V. 


A  \  -\  -  .  *  •  *  v*  O  V'  ‘  *  -V’  •  V  *  S 


Lemma  5-5 


If  .S  has  no  (Q  .  2)-receivers  above  level  k  —  1.  then  lor  any  level  /  where  1  ^  ^  k 

dr (i )  =  i  inr 

a)  drd  -  1  )  =  1 ,  ami 

b)  .S'  has  a  1 P .  1  '-receiver  at  level  /  —  1 

Proof 

The  proof  of  I  emma  5.5  has  two  parts.  I’irsl  1  show  sufficiency  of  (a)  and  (b  ;. 

If  dPd  —  1)=  1  and  level  /  —  1  has  a  ( P .  1  ^receiver.  then 
Pd  )  =  3  Pd  -  1)  -  1. 

Q(l)  =  IQU  -  1)  +  1  implying  dPU  )  =  2dP(l  —  1 )  —  2  =  1 . 

To  show  necessity  of  (a)  and  (b)  assume  that  dPd  )  =  1.  Then  since  5  has  no  (Q  .  2) 
receiv  ers  above  level  / .  dP(l  —  1 )  ^  1 ;  otherwise,  if  JPil  —  1 )  <  1 .  then  dQ  (/  —  1 )  >  1. 
hence  dQ  (/  )  ^  1  (by  Lemma  5.3)  and  thus  dPd  )  ^  —1  contradicting  the  assumption 
dPd  )  =  1. 

Suppose,  then,  that  dPd  —  1)  >  1.  Then 

P(l  )  ^  3 Pd  -  1)  -  2 

£3 (Qd  -  n  +  2)  -  2 
=  3 (Qdl  -  1)  +  4. 

Qd  )  $  30 (/  -  1  )  +  2. 

T  hen 

dPd  l  >  4-2=2. 

hence  by  the  assumption  UPd  )  =  1.  condition  (a)  is  necessary.  Then 

P  d  )  =  3 Pd  -  1  )  +  e.Qd  )  =  3Qd  -  1 )  -  6  for  some  -2  <  e  ^  2 


ii P  ( l  )  =  3 ru  —  1  J  —  3C?  ( /  -  1  )  +  2e 
=  3JPU  -  1 )  +  2e 
=  3  +  26 

Then 

2e  =  —2  implying  e  =  —  1. 

Since  /’(/)=  3/’(/  —  1  )  —  1 .  !e\  el  /  —  1  must  have  a  ( P  .  1  )-rece;v  er  on  tl. 

L 

Corollary  5.3 

II  there  is  a  lev  el  j  in  T  such  that  dP(  j  )  >  0  while  dP  ( j  —  1 )  <0.  then  there  are  no 
1(2  •  2  1-receivers  above  level  j  —  1  in  T . 

Proof 

By  lemma  5.4.  there  is  a  ( Q  .  2)-receiver  on  level  j  —  1.  Consequently,  by  definition, 
there  are  no  (P.  '  2  )-receivers  ir,  the  tree.  Furthermore,  since  dQ  ( j  —  1 )  =  1 .  by  applying 
I  emma  5.5  j  —  1  times,  vve  find 

dQ  <  j  -  2  )  =  dQ  (  /  -31=  =  dQ  (0 )  =  1 . 

I  urthermore.  there  is  a  (Q  .  1  l-receiver  on  ea«.;i  of  these  levels:  thus  no  level  abov  level  j  has  a 
1  y.  .  2  .-receiver 

G 

Seca  1  the  definition  of  X(  J  '  (see  section  5.1  ) 

Lemma  5.6 

urr  * e  '  is  optimal,  and  ’  ere  is  some  le  e!  i.  <  »  _ h  that 

’  •  /  *  \ 

2  ere  .  •  r~,  0- receiver  levels  tne-  .tan  the  ea * •**  -  elow  level  h  . 
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of  \x  hich  r  is  the  level  of  the  children  ol  ihe  lowest  one.  and 
3)  .S  has  no  (Q  .  2)-receivers  above  level  k  —  1 . 

Then 

a)  JQ(r  )  2  3" 

b )  £  JQ  U  )  .V  ( m  ) 

Proof 

The  prool  is  bv  indue  lion  on  rn 

Suppose  m  =  1.  Since  .S'  has  no  (Q  .  2 Treceivers.  by  Corollary  5.2.  JQ  (r  —  1 )  ^  1  By 
definition  ol  r  .  level  r  —  1  is  a  O-receiver  level  so  dQ  ( r  )  =  3 dQ  (r  —  1 )  >  3  =  31  ior  condition 
(a).  Since 

JQ(h)>  l. 

£dQ  (i  )  1  +  3  =  4  =  .V  ( 1 )  =  ,V  (m  )  w  hich  is  condition  (b). 

Now  assume  that  the  lemma  is  true  for  all  m  <  n  .  With  rn  =  n  .  r  is  the  level  of  the 
children  of  the  n  th  O-receiver  level,  let  r'  be  the  level  of  the  children  of  the  n  —1st  O-receiver 
level.  Bv  the  induction  hypothesis. 

JQ(r  )  £  3' 

and 

ZdQU  l  Z  \(n  -  1  ) 

Since  r  ^  r  —  1 .  by  Corollary  5.2.  w  uh  h  =  r’  .  %  -r  —  1 .  JQ  <r  —  ]  )  ^  JQ  ( r'  ).  By 
definition,  level  r  —  1  is  a  O-receiver  level,  so 

dQ  ( r  .)  =  3  JQ  ( r  -  1  ) 


A 
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^  3<lQ(r'  ) 

^  3(3"-')  =  3"  . 

Then 

ZdQ  (»  )  >  LdQ  (t  )  +  dQ{r)  2  .V  (n  -  1 )  +  3"  =  .V  in  ). 

.  -h  : =h 

□ 

5.2.5  The  Lower  Bound 

!  now  establish  the  lower  bound  for  the  number  of  edges  broken  in  a  proper  partition  of  a 
complete  ternary  tree  of  height  k  . 


Theorem  5.2 

Every  proper  partition  S  =  (P  .  Q  )  of  a  complete  ternary  tree  T  of  height  k  >  1  has  more 
than  k  —  log3-t  +  1  broken  edges. 


P 

r* 

■%*\ 

n 

V 


Proof 

Firs’,  consider  k  =  2.  To  make  a  proper  partition  of  the  complete  ternary  tree  of  height  2. 
at  ieast  three  (  >  2  —  log32  +  1 )  edges  must  be  broken.  We  see  this  because,  since  T 2  has  12 
non-root  nodes,  each  of  the  sets  P  and  Q  must  have  6  non-root  nodes  in  it.  Either  P  or  Q  must 
have  more  of  tne  nodes  at  level  1  in  it  than  does  the  other.  Without  loss  of  generality,  assume 
that  P  ha«  more  If  all  of  the  nodes  at  level  1  are  n  P  then  six  nodes  at  level  2  must  be  in  Q  . 
vhicn  causes  six  edges  to  be  broken.  If.  however,  two  nodes  at  level  1  are  in  P .  then  at  least 
■  me  rdge  o  he  root  must  be  broken.  Furthermore,  to  gel  5  nodes  at  level  2  in  Q  .  at  ieast  two 
-v.ces  to  .eve:  1  must  oe  broken:  thus  at  leas',  tf-ee  .ige-  are  broker 

F  ■'  -  ^  3  I  .i!  prove  Theorem  5.2  by  cenradictvr.  Iw  o  =  (p  Q  )  be  an  optima' 
ar’  ;u  •  if  nd  -  -•'ose  comm(S )  ^  a  -  ic  ^3k  -  1  '■  te  .hat  'mmtS  )  =  commff  •  -  '• 

-  w  -  '  .an  hu  ?  no  euges  br'ken  to  the  leaves  since  ,ne"-  ire  n*  edge'  to  the  leaves). 
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By  Lemma  5.2.  we  can  assume  that  S  has  ai  most  one  1-  or  2-receiver  node  at  any  level 
above  k  —  1  and  no  3-receivers  anywhere  above  level  k  —  1.  Because  T  is  a  complete  ternary 
tree,  each  level  has  an  odd  number  of  nodes.  Thus,  for  all  1  ^  i  ^  k  —  1 .  P(i  )  Q(i  ). 
Without  loss  of  generality,  let  Pi  1 )  >  (2(1). 

L 

If  there  is  no  level  j  such  that  Qlj )  >  P(j)  then  ^  k  >1.  hence  the  partition 

/  =i 

is  not  proper.  Consequently,  there  must  be  at  least  one  level  j  such  that  Plj  )  >  Q(j  )  but 
P  ( j  +  1 )  <  Q  ( j  +1).  Call  any  such  level  a  switch-to-Q  level.  If  level  —  1  is  the  only 
switch-to-Q  level  in  T .  let  j  —  k  —  1.  Otherwise,  let  j  be  the  switch-to-(2  level  furthest  down 
in  the  tree  other  than  k  —  1.  Now  we  have  two  cases. 

Case  I  j  <  k  —  1 

By  Lemma  5.4,  dP(j  )-  1.  and  S  has  a  (P .  2)-receiver  at  level  j  in  T.  Consequently,  by 
Lemma  5.2.  S  has  no  ( Q  .  2)-receivers  anywhere  in  T  except,  perhaps,  at  level  k  —  1. 
Furthermore,  by  Corollary  5.3.  5  has  no  (P .  2)-receivers  at  any  level  above  level  j  :  hence  j  is 
the  highest  swiich-to-<2  level.  In  addition,  by  Lemma  5.4,  S  has  no  switch-lo-/7  levels  above 
k  —  1  since  a  switch-to-/>  level  requires  a  (Q  .  2)-receiver:  consequently,  there  is  no  level  i 
above  j  such  that  Q  (i  )  >  P(i  ).  By  Lemma  5.5.  since  dP(  j)  —  1 .  dP(  j  —  1 )  =  1  and  level 
j  —  1  has  a  (P  .  1  )-receiver  By  applying  Lemma  5.5  j  —  1  times,  we  see  that  for  all 
1  ^  i  <  j  .  dP(i  )  =  1  and  level  i  has  a  (P .  l)-receiver.  and  level  0  must  be  at  least  a  1- 
receiver  level.  Level  j  has  a  (P .  2)-receiver.  so  comm(5  .  j  )  =  j  +2  and 

I/Mn=  £e<i)  +  y. 

/  =i  . = i 

Since  S  must  be  a  proper  partition. 

Z,  P<i  )  =  Z  G(t)-  /  +  e 

.  =,  *i  /  =.  *i 

where  €  €  1  —  1.0.  1  ’. 
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If  al  least  one  edge  on  each  level  below  j  is  broken,  then  commIS  .  k  —  1 )  £  k  +1  For 
comml.V  )  ^  k  —  log(L  +  1 .  al  least 

k  +  1  —  ik  -  log3&  +  1 )  =  Iog3£ 

levels  between  j  and  k  (exclusive)  must  be  O-receiver  levels.  Clearly,  all  conditions  for 
Lemma  5  6  w  ith  h  =  j  +  1  and  m  =  logj/fc  are  met.  Then  by  Lemma  5.6. 

52  dQ  ( i  )  ^  /V  ( log  ) 

>  =j 

^logj^  +1  _  ^ 

—  -  2 

3*  -  1 
2 

w  here  r  is  the  level  of  the  children  of  the  lowest  O-receiver  level  (not  including  the  leaves).  If 
r  =  k  .  let  l  -  k  :  otherwise.  I  =  k  —  1.  Then,  again  since  S  has  no  l Q  .  2)-receivers  above  level 
l .  by  corollary  5.2.  dQ  (i  )  dQ(r  )  for  all  i ,  r  ^  i  $  l .  Thus. 

£  dQinz 

=  ;  +1 

Two  possible  subcases  arise  at  this  juncture. 

Case  1. 1  Q  ( k  )  >  P  l  k  ) 


In  this  case. 

52  dQu  >  ^ 

=.  »i 

Since  S  has  at  least  .og-.L  f  1  ()-recei\er  levels  (including  the  leaves).  T  has  a  least  that  many 


lev  e.sbeloA  j.  Then 


% 


k  y  i  -  k  g3tf  1 . 


Clear!'-,  the-. 


2k  -  1  <  - 

-  :  J  ■  i  I 


s  %  ; 


5S 


—  £dP(i  )  +  € 

,  =1 

=  j  +  €.  €  €  i-1.0,  1). 

Furthermore. 

k  >  1  implies  k  —  log3*  <  — —  • 

and 

k  ^  j  +  logj k  +  1  implies  j  ^  k  —  log3*  —  1. 
so 

j  +  €  ^  k  —  logj/c  . 

Then 

< 

k  ~  log3*  ^  j  +  €  =  £  dQ  (t  ) 

■  =j+i 

^  U-Z.}.  >  k  -  log . 
or 

k  -  log,*  >  k  -  log3* 

which  is  a  contradiction.  Thus,  in  this  case  commfS  )  >  k  —  log 3*  +  1. 

Case  1.2  P (k  )  >  Q(k  ) 

Since  j  <  k  —  1 .  we  still  know  that  dQ  {k  —  1 )  ^  31' s,<  =  k  .  Since  P(k  )  >  Q(k  ).  at 
3‘  -l 

least  — of  the  nodes  at  level  *  —  1  have  children  in  P 

Concerning  the  nodes  at  level  *  —  1.  we  know  that 
3*  =  2 P(k  -  1)  +  dQ(k  -  1): 

i.e  the  total  number  of  nodes  at  level  *  —  1  is 


So  at  least 


dQ(k  -  1) 


Q  -nodes  on  level  k  —  1  have  children  in  P  hence  at  least 


dQ(k  -  1) 


x3 


=  dQ  (k  -  1 )  + 


dQ(k  -  1) 


Z  k  +  1 


edges  are  broken.  Then  the  overall  partition  has 
comm(S  )'£j+\+k+\>h  —  logylr  +  1 


which  is  a  contradiction.  Therefore,  in  this  case.  comm(S  )  >  k  —  log3£  +  1 . 

Case  II  j  =  k  —  1  i.e.,  for  all  1  <  i  <  k  .'P{i  )  >  Q(i  ),P(k  )  <  Q  (k  ) 

By  Lemma  5.4.  if  P(k  )  <  Qik  )  while  P{k  -  ])  >  Q{k  -  1 ).  at  least  2  edges  to  level 

k  —  1  must  be  broken.  If  at  least  one  edge  is  broken  to  each  level  above  k  —  1  then 
commfS  )  ^  k  +1.  To  gel  comm(S  )  <  k  -  log3*  +  1 .  at  least  k  +  1  -  (k  -  !og3A  +  1)  levels 

above  k  —  1  must  be  O-receiver  levels.  Since  P(i  )  >  Q(i)  for  all  1  ^  i  <  k  ,  by  Lemma  5.6. 

dPik  -  1)  £  3l0?,‘  =  k  . 


and 


.  -i  -d»?e  *1  , 

ZdP(i  )  Z  - _ L 


’ecause  S  is  a  proper  parT .non. 


iQ(k  )  =  £rf/>  e 

=  i 


Par’icu'uriy.  dQ  <.k  l  1.  i'nen  by  tl,e  arg’^mc  - 
ldQ(k  -  1)  j 

- r -  1  "  <  v  —  a 

1  2  I  -  i 


ven  in  Case  1.2.  _t 


■c  *  ] 


e  t 


i 


I 


I 
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edges  are  broken  to  level  k  —  1.  Since  at  least  as  many  edges  are  broken  in  the  entire  partition 
as  are  broken  at  the  lowest  level,  this  cannot  give  a  partition  with  comm(.S  )  ^  k  —  log,/:  +  1. 


5.3  Comparison  of  Upper  and  Lower  Bounds 

Unsurprisingly,  the  upper  and  lower  bounds  presented  in  Sections  5.1  and  5.2  are  quite 
tight.  They  are  within  one  of  each  other  for  all  values  of  k  .  and  correspond  on  many. 


Corollary  5.4 

If  h  is  as  defined  as  in  Section  5. 1 .  i.e.,  h  is  the  largest  integer  such  that  k  s?  h  +N(h). 
the  lower  bound  for  communication  cost  is  achievable  for  all  trees  of  height  k  such  that 
k  ^  3*  *1  using  the  schedule  presented. 


Proof 

The  proof  is  algebraic.  For  T  of  height  k  —  h  +  /V  (h  )  4-  L  .  the  schedule  presented  in 
Section  5.1  has  communication  cost  k  —  h  +1.  If  the  lower  bound  is  achieved,  then 


k  —  logj  k  +1  <  k  —  h  +1  —  logj&  +  2. 


Thus. 


log3£  >  h  ^  logj/r  —  1 . 


k  >  y  .  and  3*  +l  k  For  this  to  be  sc 


•ja  +i  +  1 

L  £  - _ — L  -h 


For  h  to  be  w  ithin  the  right  range.  L  must  be  no  more  than  3'1  +  l  —  1  h  .  The  value  computed 
above  lies  well  within  this  range.  Thus,  for 


>  k  >  y 


(he  lower  bound  on  communication  is  achieseble  This  is  noi  surprising  since  ihe  schedule 
presemed  in  Section  5.1  follows  (he  format  ol  the  optimal  partition  presented  in  Section  5 
eery  closely  at  least  as  far  down  as  the  sw  itch-to-(2  level,  which  is  as  far  as  1  define  it. 


□ 


£ 
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CHAPTER  6 


CONCLUSIONS  AND  OPEN  PROBLEMS 


Scheduling  dags  on  many  processors  to  minimize  completion  time  has  long  been  known  to 
be  a  difficult  problem.  In  some  restricted  cases,  however,  the  problem  is  not  so  bleak.  When  we 
try  to  schedule  a  general  dag  on  two  processors,  or  a  tree  on  any  number  of  processors,  we  can 
do  so  in  time  polynomial  in  the  size  of  the  graph. 

Knowing  this,  it  is  reasonable  to  wonder  how  difficult  the  problem  becomes  if  we  introduce 
a  new  constraint,  that  of  minimizing  communication  cost,  to  the  problem.  Although  scheduling 
a  general  dag  on  two  processors  in  minimum  time  can  be  done  in  polynomial  time.  I  have  shown 
that  when  a  communication  cost  constraint  is  added,  the  problem  again  becomes  NPrcomplete. 
Afrati  et  al.  [Afrati  1985]  show  that  scheduling  a  tree  on  an  arbitrary  number  of  processors  is 
also  an  NP-complete  problem. 

The  difficulty  of  the  problem  of  scheduling  a  tree  on  two  processors  in  minimum  tine  and 
communication  cost  cannot  be  directly  inferred  from  the  previous  results,  and  it  remains  open. 
For  complete  binary  and  ternary  trees,  however.  I  have  determined  upper  and  lower  bounds  on 
the  communication  cost  of  computation  in  minimum  lime. 

First,  a  complete  binary  tree  can  be  computed  in  minimum  time  on  two  processors  with 
communication  cost  1.  This  is  also  the  minimum  communication  that  must  be  incurred. 

Second.  I  show  that  a  complete  ternary  tree  of  height  k  can  be  scheduled  in  minimum  time  with 
communication  cost  no  more  than  k  —  h  +1  w'here  h  is  defined  to  be  the  largest  integer  such 
3*  *i  _  | 

that  k  ^  h  + - - - .  Asa  lower  bound.  I  show  that  communication  cost  greater  than 

k  -  log3k-  +  1  is  required  for  a  minimum  time  schedule.  Comparing  these  bounds,  w'e  can  ; 
that  a  lower  bound  is  achieavable  for  an  infinite  number  of  trees  using  the  algorithm  1 


presented. 


Several  questions  remain  to  be  solved,  primarily  concerning  scheduling  trees. 

1)  Is  the  problem  of  scheduling  an  in-directed  tree  on  two  processors  in  minimum  lime  and 
communication  \P-compleie0 

2)  If  so.  what  happens  if  we  further  restrict  the  tree  to  a  binary  trees?  Is  it  easier  to  schedule  a 
binary  tree  in  minimum  time  and  communication? 

3)  Host  difficult  is  it  to  schedule  an  out-directed  tree  on  an  arbitrary  number  of  processors  in 
minimum  time  and  communication  cost  (using  Definition  2  of  communication  cost)? 

4)  Again,  if  the  problem  is  XP-complete.  how  difficult  is  it  to  schedule  an  out-directed  tree  on 
two  processors0  How  difficult  for  an  out-directed  binary  tree? 


u 
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